[Groonga-commit] groonga/groonga-normalizer-mysql at 3f747e9 [master] Describe about NormalizerMySQLUnicodeExcept...

Back to archive index

Kouhei Sutou null+****@clear*****
Mon May 27 16:19:48 JST 2013


Kouhei Sutou	2013-05-27 16:19:48 +0900 (Mon, 27 May 2013)

  New Revision: 3f747e9960f0f64c021da378c98ea80759b827a9
  https://github.com/groonga/groonga-normalizer-mysql/commit/3f747e9960f0f64c021da378c98ea80759b827a9

  Message:
    Describe about NormalizerMySQLUnicodeExcept...

  Modified files:
    README.md

  Modified: README.md (+49 -3)
===================================================================
--- README.md    2013-05-27 15:59:39 +0900 (72a31af)
+++ README.md    2013-05-27 16:19:48 +0900 (ef96fac)
@@ -7,11 +7,57 @@ groonga-normalizer-mysql
 ## Description
 
 Groonga-normalizer-mysql is a groonga plugin. It provides MySQL
-compatible normalizers to groonga. They are `NormalizerMySQLGeneralCI`
-and `NormalizerMySQLUnicodeCI`. `NormalizerMySQLGeneralCI` corresponds
-to `utf8mb4_general_ci`.  `NormalizerMySQLUnicodeCI` corresponds to
+compatible normalizers and a custom normalizer to groonga.
+
+MySQL compatible normalizers are `NormalizerMySQLGeneralCI` and
+`NormalizerMySQLUnicodeCI`. `NormalizerMySQLGeneralCI` corresponds to
+`utf8mb4_general_ci`.  `NormalizerMySQLUnicodeCI` corresponds to
 `utf8mb4_unicode_ci`.
 
+A custom normalizer is
+`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark`. It is
+self-descriptive name but long. It is a variant normalizer of
+`NormalizerMySQLUnicode`. It has different behaviors. The followings
+are the different behaviors.
+
+* `NormalizerMySQLUnicode` normalizes all small Hiragana such as `ぁ`,
+  `っ` to Hiragana such as `あ`, `つ`.
+  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark`
+  doesn't normalize `ぁ` to `あ` nor `っ` to `つ`. `ぁ` and `あ` are
+  different characters. `っ` and `つ` are also different characters.
+  This behavior is described by `ExceptKanaCI` in the long name.  This
+  following behaviors ared described by
+  `ExceptKanaWithVoicedSoundMark` in the long name.
+* `NormalizerMySQLUnicode` normalizes all Hiragana with voiced sound
+  mark such as `が` to Hiragana without voiced sound mark such as `か`.
+  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
+  normalize `が` to `か`. `が` and `か` are different characters.
+* `NormalizerMySQLUnicode` normalizes all Hiragana with semi-voiced sound
+  mark such as `ぱ` to Hiragana without semi-voiced sound mark such as `は`.
+  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
+  normalize `ぱ` to `は`. `ぱ` and `は` are different characters.
+* `NormalizerMySQLUnicode` normalizes all Katakana with voiced sound
+  mark such as `ガ` to Katakana without voiced sound mark such as `カ`.
+  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
+  normalize `ガ` to `カ`. `ガ` and `カ` are different characters.
+* `NormalizerMySQLUnicode` normalizes all Katakana with semi-voiced sound
+  mark such as `パ` to Hiragana without semi-voiced sound mark such as `ハ`.
+  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` doesn't
+  normalize `パ` to `ハ`. `パ` and `ハ` are different characters.
+* `NormalizerMySQLUnicode` normalizes all halfwidth Katakana with
+  voiced sound mark such as `ガ` to halfwidth Katakana without voiced
+  sound mark such as `カ`.
+  `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark`
+  normalizes all halfwidth Katakana with voided sound mark such as `ガ`
+  to fullwidth Katakana with voiced sound mark such as `ガ`.
+
+`NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark` is MySQL
+incompatible normalizer but it is useful for Japanese text. For
+example, `ふらつく` and `ブラック` has different
+means. `NormalizerMySQLUnicodeCI` identifies `ふらつく` with `ブラック
+` but `NormalizerMySQLUnicodeCIExceptKanaCIKanaWithVoicedSoundMark
+doesn't identify them.
+
 ## Install
 
 ### Debian GNU/Linux
-------------- next part --------------
HTML����������������������������...
Download 



More information about the Groonga-commit mailing list
Back to archive index