[Groonga-commit] groonga/groonga at cea6796 [master] doc: use more meaningful example

Back to archive index
Yasuhiro Horimoto null+****@clear*****
Fri Jan 4 16:02:16 JST 2019


Yasuhiro Horimoto	2019-01-04 16:02:16 +0900 (Fri, 04 Jan 2019)

  Revision: cea6796bca7e1a709af9e066e211c59ec55e7fd4
  https://github.com/groonga/groonga/commit/cea6796bca7e1a709af9e066e211c59ec55e7fd4

  Message:
    doc: use more meaningful example

  Added files:
    doc/source/example/reference/tokenizers/token-unigram-non-ascii.log
  Modified files:
    doc/locale/ja/LC_MESSAGES/reference.po
    doc/source/reference/tokenizers/token_unigram.rst

  Modified: doc/locale/ja/LC_MESSAGES/reference.po (+12 -3)
===================================================================
--- doc/locale/ja/LC_MESSAGES/reference.po    2019-01-04 15:26:38 +0900 (c3bf527d6)
+++ doc/locale/ja/LC_MESSAGES/reference.po    2019-01-04 16:02:16 +0900 (c3f79ee41)
@@ -27804,11 +27804,20 @@ msgid "``TokenUnigram`` hasn't parameter::"
 msgstr "``TokenUnigram`` には、引数がありません。"
 
 msgid ""
-":ref:`token-bigram` uses 2 characters per token. ``TokenUnigram`` uses 1 "
+"If normalizer is used, ``TokenUnigram`` uses white-space-separate like "
+"tokenize method for ASCII characters. ``TokenUnigram`` uses unigram tokenize "
+"method for non-ASCII characters."
+msgstr ""
+"ノーマライザーを使っている場合は ``TokenUnigram`` はASCIIの文字には空白区切り"
+"のようなトークナイズ方法を使います。非ASCII文字にはユニグラムのトークナイズ方"
+"法を使います。"
+
+msgid ""
+"If ``TokenUnigram`` tokenize non-ASCII charactors, ``TokenUnigram`` uses 1 "
 "character per token as below example."
 msgstr ""
-":ref:`token-bigram` は各トークンが2文字ですが、以下の例のように "
-"``TokenUnigram`` は各トークンが1文字です。"
+"``TokenUnigram`` が非ASCII文字をトークナイズすると、以下の例のように "
+"``TokenUnigram`` は各トークンが1文字となります。"
 
 msgid "Tuning"
 msgstr "チューニング"

  Added: doc/source/example/reference/tokenizers/token-unigram-non-ascii.log (+48 -0) 100644
===================================================================
--- /dev/null
+++ doc/source/example/reference/tokenizers/token-unigram-non-ascii.log    2019-01-04 16:02:16 +0900 (6f51efe71)
@@ -0,0 +1,48 @@
+Execution example::
+
+  tokenize TokenUnigram "日本語の勉強" NormalizerAuto --output_pretty yes
+  # [
+  #   [
+  #     0,
+  #     1546584495.218799,
+  #     0.0002140998840332031
+  #   ],
+  #   [
+  #     {
+  #       "value": "日",
+  #       "position": 0,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     },
+  #     {
+  #       "value": "本",
+  #       "position": 1,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     },
+  #     {
+  #       "value": "語",
+  #       "position": 2,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     },
+  #     {
+  #       "value": "の",
+  #       "position": 3,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     },
+  #     {
+  #       "value": "勉",
+  #       "position": 4,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     },
+  #     {
+  #       "value": "強",
+  #       "position": 5,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     }
+  #   ]
+  # ]

  Modified: doc/source/reference/tokenizers/token_unigram.rst (+7 -3)
===================================================================
--- doc/source/reference/tokenizers/token_unigram.rst    2019-01-04 15:26:38 +0900 (ea91a094a)
+++ doc/source/reference/tokenizers/token_unigram.rst    2019-01-04 16:02:16 +0900 (8fc636610)
@@ -26,9 +26,13 @@ Syntax
 Usage
 -----
 
-:ref:`token-bigram` uses 2 characters per
-token. ``TokenUnigram`` uses 1 character per token as below example.
+If normalizer is used, ``TokenUnigram`` uses white-space-separate like
+tokenize method for ASCII characters. ``TokenUnigram`` uses unigram
+tokenize method for non-ASCII characters.
+
+If ``TokenUnigram`` tokenize non-ASCII charactors, ``TokenUnigram`` uses
+1 character per token as below example.
 
 .. groonga-command
-.. include:: ../../example/reference/tokenizers/token-unigram.log
+.. include:: ../../example/reference/tokenizers/token-unigram-non-ascii.log
 .. tokenize TokenUnigram "100cents!!!" NormalizerAuto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20190104/3012137b/attachment-0001.html>


More information about the Groonga-commit mailing list
Back to archive index