Yasuhiro Horimoto 2018-12-18 16:58:47 +0900 (Tue, 18 Dec 2018) Revision: 16c9b1a3ae2136a1b7a18612ff81f62f8de78e23 https://github.com/groonga/groonga/commit/16c9b1a3ae2136a1b7a18612ff81f62f8de78e23 Message: doc: add explain for TokenDelimit options Added files: doc/source/example/reference/tokenizers/token-delimit-delimiter-option.log doc/source/example/reference/tokenizers/token-delimit-pattern-option.log Modified files: doc/source/reference/tokenizers.rst Added: doc/source/example/reference/tokenizers/token-delimit-delimiter-option.log (+24 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/tokenizers/token-delimit-delimiter-option.log 2018-12-18 16:58:47 +0900 (eaf79fcb1) @@ -0,0 +1,24 @@ +Execution example:: + + tokenize 'TokenDelimit("delimiter", ",")' "Hello,Wold" + # [ + # [ + # 0, + # 1337566253.89858, + # 0.000355720520019531 + # ], + # [ + # { + # "value": "Hello", + # "position": 0, + # "force_prefix": false, + # "force_prefix_search": false + # }, + # { + # "value": "Wold", + # "position": 1, + # "force_prefix": false, + # "force_prefix_search": false + # } + # ] + # ] Added: doc/source/example/reference/tokenizers/token-delimit-pattern-option.log (+24 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/tokenizers/token-delimit-pattern-option.log 2018-12-18 16:58:47 +0900 (505c59546) @@ -0,0 +1,24 @@ +Execution example:: + + tokenize 'TokenDelimit("pattern", "\\.\\s*")' "This is a pen. This is an apple." + # [ + # [ + # 0, + # 1337566253.89858, + # 0.000355720520019531 + # ], + # [ + # { + # "value": "This is a pen.", + # "position": 0, + # "force_prefix": false, + # "force_prefix_search": false + # }, + # { + # "value": "This is an apple.", + # "position": 1, + # "force_prefix": false, + # "force_prefix_search": false + # } + # ] + # ] Modified: doc/source/reference/tokenizers.rst (+26 -0) =================================================================== --- doc/source/reference/tokenizers.rst 2018-12-18 16:30:19 +0900 (b46f4cce3) +++ doc/source/reference/tokenizers.rst 2018-12-18 16:58:47 +0900 (9fc076e05) @@ -429,6 +429,32 @@ Here is an example of ``TokenDelimit``: .. include:: ../example/reference/tokenizers/token-delimit.log .. tokenize TokenDelimit "Groonga full-text-search HTTP" NormalizerAuto +``TokenDelimit`` can also specify options. +``TokenDelimit`` has ``delimiter`` option and ``pattern`` option. +``delimiter`` option can split token with a specified characters. + +For example, ``Hello,Wold`` is tokenize to ``Hello`` and ``Wold`` +with ``delimiter`` option as below. + +.. groonga-command +.. include:: ../example/reference/tokenizers/token-delimit-delimiter-option.log +.. tokenize 'TokenDelimit("delimiter", ",")' "Hello,Wold" + +``pattern`` option can split token with a regular expression. +You can except needless space by ``pattern`` option. + +For example, ``This is a pen. This is an apple`` is tokenize to ``This is a pen`` and +``This is an apple`` with ``pattern`` option as below. + +Normally, when ``This is a pen. This is an apple.`` is splitted by ``.``, +needless spaces are included at the beginning of "This is an apple.". + +You can except the needless spaces by a ``pattern`` option as below example. + +.. groonga-command +.. include:: ../example/reference/tokenizers/token-delimit-pattern-option.log +.. tokenize 'TokenDelimit("pattern", "\\.\\s*")' "This is a pen. This is an apple." + .. _token-delimit-null: ``TokenDelimitNull`` -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20181218/6554f9a5/attachment-0001.html>