[Groonga-commit] groonga/groonga [master] doc: add documentation about normalizers

Back to archive index

Kouhei Sutou null+****@clear*****
Thu Dec 27 17:12:05 JST 2012


Kouhei Sutou	2012-12-27 17:12:05 +0900 (Thu, 27 Dec 2012)

  New Revision: d6abcdee0892c4dcb0ff28ee4469a7c01cbcfe68
  https://github.com/groonga/groonga/commit/d6abcdee0892c4dcb0ff28ee4469a7c01cbcfe68

  Log:
    doc: add documentation about normalizers

  Added files:
    doc/source/reference/normalizers.txt
  Modified files:
    doc/source/reference.txt

  Modified: doc/source/reference.txt (+1 -0)
===================================================================
--- doc/source/reference.txt    2012-12-27 17:09:28 +0900 (613a1c1)
+++ doc/source/reference.txt    2012-12-27 17:12:05 +0900 (7773395)
@@ -13,6 +13,7 @@
    reference/command
    reference/type
    reference/tables
+   reference/normalizers
    reference/tokenizers
    reference/query_expanders
    reference/pseudo_column

  Added: doc/source/reference/normalizers.txt (+122 -0) 100644
===================================================================
--- /dev/null
+++ doc/source/reference/normalizers.txt    2012-12-27 17:12:05 +0900 (85584f0)
@@ -0,0 +1,122 @@
+.. -*- rst -*-
+
+.. highlightlang:: none
+
+.. groonga-command
+.. database: normalisers
+
+Normalizers
+===========
+
+Summary
+-------
+
+Groonga has normalizer module. It is used when tokenizing text and
+storing table key. For example, ``A`` and ``a`` are processed as the
+same character after normalization.
+
+Normalizer module can be added as a plugin. You can customize text
+normalization by registering your normalizer plugins to groonga.
+
+A normalizer module is attached to a table. A table can have zero or
+one normalizer module. You can attach a normalizer module to a table
+by :ref:`table-create-normalizer` option in
+:doc:`/reference/commands/table_create`.
+
+Here is an example ``table_create`` that uses ``NormalizerAuto``
+normalizer module:
+
+.. groonga-command
+.. include:: ../example/reference/normalizers/example-table-create.log
+.. table_create Dictionary TABLE_HASH_KEY ShortText --normalizer NormalizerAuto
+
+.. note::
+
+   Groonga 2.0.9 or earlier doesn't have ``--normalizer`` option in
+   ``table_create``. ``KEY_NORMALIZE`` flag was used instead.
+
+   You can open an old database by groonga 2.1.0 or later. An old
+   database means that the database is created by groonga 2.0.9 or
+   earlier. But you cannot open the opened old database by groonga
+   2.0.9 or earlier. Once you open the old database by groonga 2.1.0
+   or later, ``KEY_NORMALIZE`` flag information in the old database is
+   converted to normalizer information. So groogna 2.0.9 or earlier
+   cannot find ``KEY_NORMALIZE`` flag information in the opened old
+   database.
+
+Keys of a table that has a normalizer module are normalized:
+
+.. groonga-command
+.. include:: ../example/reference/normalizers/example-load.log
+.. load --table Dictionary
+.. [
+.. {"_key": "Apple"},
+.. {"_key": "black"},
+.. {"_key": "COLOR"}
+.. ]
+.. select Dictionary
+
+``NormalizerAuto`` normalizer normalizes a text as a downcased text.
+For example, ``"Apple"`` is normalized to ``"apple"``, ``"black"`` is
+normalized to ``"blank"`` and ``"COLOR"`` is normalized to
+``"color"``.
+
+If a table is a lexicon for fulltext search, tokenized tokens are
+normalized. Because tokens are stored as table keys. Table keys are
+normalized as described above.
+
+Built-in normalizers
+--------------------
+
+Here is a list of built-in-normalizers:
+
+  * ``NormalizerAuto``
+  * ``NormalizerNFKC51``
+
+``NormalizerAuto``
+^^^^^^^^^^^^^^^^^^
+
+Normally you should use ``NormalizerAuto``
+normalizer. ``NormalizerAuto`` was the normalizer for groonga 2.0.9 or
+earlier. ``KEY_NORMALIZE`` flag in ``table_create`` on groonga 2.0.9
+or earlier equals to ``--normalizer NormalizerAuto`` option in
+``table`` on groonga 2.1.0 or later.
+
+``NormalizerAuto`` supports all encoding. It uses Unicode NFKC
+(Normalization Form Compatibility Composition) for UTF-8 encoding
+text. It uses encoding specific original normalization for other
+encodings. The results of those original normalization are similar to
+NFKC.
+
+For example, half-width katakana (such as U+FF76 HALFWIDTH KATAKANA
+LETTER KA) + half-width katakana voiced sound mark (U+FF9E HALFWIDTH
+KATAKANA VOICED SOUND MARK) is normalized to full-width katakana with
+voiced sound mark (U+30AC KATAKANA LETTER GA). The former is two
+chracters but the latter is one character.
+
+Here is an example that uses ``NormalizerAuto`` normalizer:
+
+.. groonga-command
+.. include:: ../example/reference/normalizers/normalizer-auto.log
+.. table_create NormalLexicon TABLE_HASH_KEY ShortText --normalizer NormalizerAuto
+
+``NormalizerNFKC51``
+^^^^^^^^^^^^^^^^^^^^
+
+``NormalizerNFKC51`` normalizes texts by Unicode NFKC (Normalization
+Form Compatibility Composition) for Unicode version 5.1. It supports
+only UTF-8 encoding.
+
+Normally you don't need to use ``NormalizerNFKC51`` explicitly. You can
+use ``NormalizerAuto`` instead.
+
+Here is an example that uses ``NormalizerNFKC51`` normalizer:
+
+.. groonga-command
+.. include:: ../example/reference/normalizers/normalizer-nfkc51.log
+.. table_create NormalLexicon TABLE_HASH_KEY ShortText --normalizer NormalizerNFKC51
+
+See also
+--------
+
+* :doc:`/reference/commands/table_create`
-------------- next part --------------
HTML����������������������������...
Download 



More information about the Groonga-commit mailing list
Back to archive index