[Groonga-commit] groonga/groonga at 1dd039b [master] doc: use more meaningful example for TokenTrigram

Back to archive index
Yasuhiro Horimoto null+****@clear*****
Fri Jan 4 16:25:48 JST 2019


Yasuhiro Horimoto	2019-01-04 16:25:48 +0900 (Fri, 04 Jan 2019)

  Revision: 1dd039baf602fc2ac49a390ebcfb7e3a6d5dd59f
  https://github.com/groonga/groonga/commit/1dd039baf602fc2ac49a390ebcfb7e3a6d5dd59f

  Message:
    doc: use more meaningful example for TokenTrigram

  Added files:
    doc/source/example/reference/tokenizers/token-trigram-non-ascii.log
  Modified files:
    doc/locale/ja/LC_MESSAGES/reference.po
    doc/source/reference/tokenizers/token_trigram.rst

  Modified: doc/locale/ja/LC_MESSAGES/reference.po (+20 -4)
===================================================================
--- doc/locale/ja/LC_MESSAGES/reference.po    2019-01-04 16:18:19 +0900 (c3f79ee41)
+++ doc/locale/ja/LC_MESSAGES/reference.po    2019-01-04 16:25:48 +0900 (b5425c354)
@@ -27784,11 +27784,20 @@ msgid "``TokenTrigram`` hasn't parameter::"
 msgstr "``TokenTrigram`` には、引数がありません。"
 
 msgid ""
-":ref:`token-bigram` uses 2 characters per token. ``TokenTrigram`` uses 3 "
-"characters per token as below example."
+"If normalizer is used, ``TokenTrigram`` uses white-space-separate like "
+"tokenize method for ASCII characters. ``TokenTrigram`` uses trigram tokenize "
+"method for non-ASCII characters."
 msgstr ""
-":ref:`token-bigram` は各トークンが2文字ですが、以下の例のように "
-"``TokenTrigram`` は各トークンが3文字です。"
+"ノーマライザーを使っている場合は ``TokenTrigram`` はASCIIの文字には空白区切り"
+"のようなトークナイズ方法を使います。非ASCII文字にはトリグラムのトークナイズ方"
+"法を使います。"
+
+msgid ""
+"If ``TokenTrigram`` tokenize non-ASCII charactors, ``TokenTrigram`` uses 3 "
+"character per token as below example."
+msgstr ""
+"``TokenTrigram`` が非ASCII文字をトークナイズすると、以下の例のように "
+"``TokenTrigram`` は各トークンが3文字となります。"
 
 msgid "``TokenUnigram``"
 msgstr ""
@@ -28302,6 +28311,13 @@ msgid "``window_sum``"
 msgstr ""
 
 #~ msgid ""
+#~ ":ref:`token-bigram` uses 2 characters per token. ``TokenTrigram`` uses 3 "
+#~ "characters per token as below example."
+#~ msgstr ""
+#~ ":ref:`token-bigram` は各トークンが2文字ですが、以下の例のように "
+#~ "``TokenTrigram`` は各トークンが3文字です。"
+
+#~ msgid ""
 #~ "``TokenTrigram`` is similar to :ref:`token-bigram`. The differences "
 #~ "between them is token unit. :ref:`token-bigram` uses 2 characters per "
 #~ "token. ``TokenTrigram`` uses 3 characters per token."

  Added: doc/source/example/reference/tokenizers/token-trigram-non-ascii.log (+48 -0) 100644
===================================================================
--- /dev/null
+++ doc/source/example/reference/tokenizers/token-trigram-non-ascii.log    2019-01-04 16:25:48 +0900 (40b1f2b91)
@@ -0,0 +1,48 @@
+Execution example::
+
+  tokenize TokenTrigram "日本語の勉強" NormalizerAuto
+  # [
+  #   [
+  #     0,
+  #     1546586185.123834,
+  #     0.0003123283386230469
+  #   ],
+  #   [
+  #     {
+  #       "value": "日本語",
+  #       "position": 0,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     },
+  #     {
+  #       "value": "本語の",
+  #       "position": 1,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     },
+  #     {
+  #       "value": "語の勉",
+  #       "position": 2,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     },
+  #     {
+  #       "value": "の勉強",
+  #       "position": 3,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     },
+  #     {
+  #       "value": "勉強",
+  #       "position": 4,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     },
+  #     {
+  #       "value": "強",
+  #       "position": 5,
+  #       "force_prefix": false,
+  #       "force_prefix_search": false
+  #     }
+  #   ]
+  # ]

  Modified: doc/source/reference/tokenizers/token_trigram.rst (+8 -4)
===================================================================
--- doc/source/reference/tokenizers/token_trigram.rst    2019-01-04 16:18:19 +0900 (18a4545d0)
+++ doc/source/reference/tokenizers/token_trigram.rst    2019-01-04 16:25:48 +0900 (b1f89ad6a)
@@ -26,9 +26,13 @@ Syntax
 Usage
 -----
 
-:ref:`token-bigram` uses 2 characters per
-token. ``TokenTrigram`` uses 3 characters per token as below example.
+If normalizer is used, ``TokenTrigram`` uses white-space-separate like
+tokenize method for ASCII characters. ``TokenTrigram`` uses trigram
+tokenize method for non-ASCII characters.
+
+If ``TokenTrigram`` tokenize non-ASCII charactors, ``TokenTrigram`` uses
+3 character per token as below example.
 
 .. groonga-command
-.. include:: ../../example/reference/tokenizers/token-trigram.log
-.. tokenize TokenTrigram "10000cents!!!!!" NormalizerAuto
+.. include:: ../../example/reference/tokenizers/token-trigram-non-ascii.log
+.. tokenize TokenTrigram "日本語の勉強" NormalizerAuto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20190104/90e3ceb1/attachment-0001.html>


More information about the Groonga-commit mailing list
Back to archive index