groonga/groonga at 2271225 [master] doc: add explain about complex pattern (Groonga-commit) - Groonga - fulltext search engine.

Yasuhiro Horimoto	2018-12-19 11:33:59 +0900 (Wed, 19 Dec 2018)

  Revision: 2271225a564b97090c29ca1975f11c00b45f217e
  https://github.com/groonga/groonga/commit/2271225a564b97090c29ca1975f11c00b45f217e

  Message:
    doc: add explain about complex pattern

  Modified files:
    doc/locale/ja/LC_MESSAGES/reference.po
    doc/source/reference/tokenizers.rst

  Modified: doc/locale/ja/LC_MESSAGES/reference.po (+68 -0)
===================================================================

--- doc/locale/ja/LC_MESSAGES/reference.po    2018-12-19 09:37:14 +0900 (b9a5e84d0)
+++ doc/locale/ja/LC_MESSAGES/reference.po    2018-12-19 11:33:59 +0900 (e0c668694)
@@ -27311,6 +27311,74 @@ msgstr ""
 "以下の例の用に ``pattern`` オプションを使うことで、その不要な空白を除去できま"
 "す。"
 
+msgid "You can extract token in complex conditions by ``pattern`` option."
+msgstr "``pattern`` オプションを使って複雑な条件でトークンを抽出できます。"
+
+msgid ""
+"For example, ``これはペンですか！？リンゴですか？「リンゴです。」`` is "
+"tokenize to ``これはペンですか`` and ``リンゴですか``, ``「リンゴです。」`` "
+"with ``delimiter`` option as below."
+msgstr ""
+"例えば、以下のように ``delimiter`` オプションを使って、 ``これはペンです"
+"か！？リンゴですか？「リンゴです。」`` を ``これはペンですか`` と ``リンゴで"
+"すか`` 、 ``「リンゴです。」`` とトークナイズします。"
+
+msgid ""
+"``\\\\s*`` of the end of above regular expression match 0 or more spaces "
+"after a delimiter."
+msgstr ""
+"上記の正規表現の末尾の ``\\\\s*`` は、区切り文字の後ろの0個以上の空白にマッチ"
+"します。"
+
+msgid ""
+"``[。！？]+`` matches 1 or more ``。`` or ``！``, ``？``. For example, "
+"``[。！？]+`` matches ``！？`` of ``これはペンですか！？``."
+msgstr ""
+"``[。！？]+`` は、1個以上の ``。`` または ``！``、 ``？`` にマッチします。例"
+"えば、 ``[。！？]+`` は ``これはペンですか！？`` の ``！？`` にマッチします。"
+
+msgid ""
+"``(?![）」])`` is negative lookahead. ``(?![）」])`` matches if a character "
+"is not matched ``）`` or ``」``. negative lookahead interprets in "
+"combination regular expression of just before."
+msgstr ""
+"``(?![）」])`` は否定先読みです。 ``(?![）」])`` は ``）`` または ``」`` に"
+"マッチしない場合にマッチします。否定先読みは直前の正規表現と合わせて解釈しま"
+"す。"
+
+msgid "Therefore it interprets ``[。！？]+(?![）」])``."
+msgstr "したがって、 ``[。！？]+(?![）」])`` を解釈します。"
+
+msgid ""
+"``[。！？]+(?![）」])`` matches if there are not ``）`` or ``」`` after ``。"
+"`` or ``！``, ``？``."
+msgstr ""
+"``[。！？]+(?![）」])`` は、``。`` または ``！``、 ``？`` の後ろに ``）`` ま"
+"たは ``」`` が無い場合にマッチします。"
+
+msgid ""
+"In other words, ``[。！？]+(?![）」])`` matches ``。`` of ``これはペンです"
+"か。``. But ``[。！？]+(?![）」])`` doesn't match ``。`` of ``「リンゴで"
+"す。」``. Because there is ``」`` after ``。``."
+msgstr ""
+"つまり、 ``[。！？]+(?![）」])`` は、 ``これはペンですか。`` の ``。`` にマッ"
+"チしますが、 ``「リンゴです。」`` の ``。`` にはマッチしません。 ``。`` の後"
+"ろに ``」`` があるためです。"
+
+msgid "``[\\\\r\\\\n]+`` match 1 or more newline character."
+msgstr "``[\\\\r\\\\n]+`` は、1個以上の改行文字にマッチします。"
+
+msgid ""
+"In conclusion, ``([。！？]+(?![）」])|[\\\\r\\\\n]+)\\\\s*`` uses ``。`` and "
+"``！`` and ``？``, newline character as delimiter. However, ``。`` and ``!"
+"``, ``？`` are not delimiters if there is ``）`` or ``」`` after ``。`` or "
+"``！``, ``？``."
+msgstr ""
+"まとめると、 ``([。！？]+(?![）」])|[\\\\r\\\\n]+)\\\\s*`` は、 ``。`` と "
+"``！`` と ``？``、 改行文字を区切り文字としています。ただし、 ``。`` または "
+"``!``、 ``？`` の後ろに ``）`` または ``」`` がある場合は、 ``。`` や ``！"
+"``、 ``？`` は区切り文字としません。"
+
 msgid ""
 "``TokenDelimitNull`` is similar to :ref:`token-delimit`. The difference "
 "between them is separator character. :ref:`token-delimit` uses space "

  Modified: doc/source/reference/tokenizers.rst (+28 -0)
===================================================================
--- doc/source/reference/tokenizers.rst    2018-12-19 09:37:14 +0900 (b96b950f2)
+++ doc/source/reference/tokenizers.rst    2018-12-19 11:33:59 +0900 (2fc150fe7)
@@ -457,6 +457,34 @@ You can except the needless spaces by a ``pattern`` option as below example.
 .. include:: ../example/reference/tokenizers/token-delimit-pattern-option.log
 .. tokenize 'TokenDelimit("pattern", "\\.\\s*")' "This is a pen. This is an apple."
 
+You can extract token in complex conditions by ``pattern`` option.
+
+For example, ``これはペンですか！？リンゴですか？「リンゴです。」`` is tokenize to ``これはペンですか`` and ``リンゴですか``, ``「リンゴです。」`` with ``delimiter`` option as below.
+
+.. groonga-command
+.. include:: ../example/reference/tokenizers/token-delimit-pattern-option-with-complex-pattern.log
+.. tokenize 'TokenDelimit("pattern", "([。！？]+(?![）」])|[\\r\\n]+)\\s*")' "これはペンですか！？リンゴですか？「リンゴです。」"
+
+``\\s*`` of the end of above regular expression match 0 or more spaces after a delimiter.
+
+``[。！？]+`` matches 1 or more ``。`` or ``！``, ``？``.
+For example, ``[。！？]+`` matches ``！？`` of ``これはペンですか！？``.
+
+``(?![）」])`` is negative lookahead.
+``(?![）」])`` matches if a character is not matched ``）`` or ``」``.
+negative lookahead interprets in combination regular expression of just before.
+
+Therefore it interprets ``[。！？]+(?![）」])``.
+
+``[。！？]+(?![）」])`` matches if there are not ``）`` or ``」`` after ``。`` or ``！``, ``？``.
+
+In other words, ``[。！？]+(?![）」])`` matches ``。`` of ``これはペンですか。``. But ``[。！？]+(?![）」])`` doesn't match ``。`` of ``「リンゴです。」``.
+Because there is ``」`` after ``。``.
+
+``[\\r\\n]+`` match 1 or more newline character.
+
+In conclusion, ``([。！？]+(?![）」])|[\\r\\n]+)\\s*`` uses ``。`` and ``！`` and ``？``, newline character as delimiter. However, ``。`` and ``!``, ``？`` are not delimiters if there is ``）`` or ``」`` after ``。`` or ``！``, ``？``.
+
 .. _token-delimit-null:
 
 ``TokenDelimitNull``
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20181219/db885872/attachment-0001.html>


Groonga - fulltext search engine.

[Groonga-commit] groonga/groonga at 2271225 [master] doc: add explain about complex pattern