Yasuhiro Horimoto 2019-01-07 16:34:50 +0900 (Mon, 07 Jan 2019) Revision: 55e953bee1116c32947a4de00c08f9986a3ee57f https://github.com/groonga/groonga/commit/55e953bee1116c32947a4de00c08f9986a3ee57f Message: doc: add NormalizerNFKC100 Added files: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-hyphen-and-prolonged-sound-mark.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-hyphen.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-kana-case-hiragana.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-kana-case-katakana.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-kana.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-katakana-bu-sounds.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-katakana-v-sounds.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-middle-dot.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-prolonged-sound-mark.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-to-romaji-complex.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-to-romaji.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-voiced-sound-mark-hiragana.log doc/source/example/reference/normalizers/normalizer-nfkc100-unify-voiced-sound-mark-katakana.log doc/source/example/reference/normalizers/normalizer-nfkc100.log doc/source/reference/normalizers/normalizer_nfkc100.rst Modified files: doc/locale/ja/LC_MESSAGES/reference.po doc/source/reference/normalizers.rst Modified: doc/locale/ja/LC_MESSAGES/reference.po (+375 -12) =================================================================== --- doc/locale/ja/LC_MESSAGES/reference.po 2019-01-05 17:28:03 +0900 (c90bc3f8d) +++ doc/locale/ja/LC_MESSAGES/reference.po 2019-01-07 16:34:50 +0900 (d947256c1) @@ -23976,6 +23976,351 @@ msgid "" "mysql>`_" msgstr "" +msgid "``NormalizerNFKC100``" +msgstr "" + +msgid "" +"``NormalizerNFKC100`` normalizes text by Unicode NFKC (Normalization Form " +"Compatibility Composition) for Unicode version 10.0." +msgstr "" +"``NormalizerNFKC100`` はUnicode 10.0用のUnicode NFKC(Normalization Form " +"Compatibility Composition)を使ってテキストを正規化します。" + +msgid "This normalizer can change behavior by specifying options." +msgstr "このノーマライザーはオプションを指定することで、動作を変更できます。" + +msgid "``NormalizerNFKC100`` has optional parameter::" +msgstr "``TokenFilterNFKC100`` は省略可能な引数があります。::" + +msgid "No options::" +msgstr "オプションなし::" + +msgid "Specify option::" +msgstr "オプション指定::" + +msgid ":ref:`normalizer-nfkc100-unify-middle-dot` is added." +msgstr ":ref:`normalizer-nfkc100-unify-middle-dot` 追加。" + +msgid ":ref:`normalizer-nfkc100-unify-katakana-v-sounds` is added." +msgstr ":ref:`normalizer-nfkc100-unify-katakana-v-sounds` 追加。" + +msgid ":ref:`normalizer-nfkc100-unify-katakana-bu-sounds` is added." +msgstr ":ref:`normalizer-nfkc100-unify-katakana-bu-sounds` 追加。" + +msgid ":ref:`normalizer-nfkc100-unify-to-romaji` is added." +msgstr ":ref:`normalizer-nfkc100-unify-to-romaji` 追加。" + +msgid "Specify multiple options::" +msgstr "複数のオプション指定::" + +msgid "" +"``NormalizerNFKC100`` also specify multiple options as above. You can also " +"specify mingle multiple options except above example." +msgstr "" +"上記のように、 ``NormalizerNFKC100`` は複数のオプションを指定することもできま" +"す。上記の例以外にも複数のオプションを組み合わせて指定できます。" + +msgid "" +"Here is an example of ``NormalizerNFKC100``. ``NormalizerNFKC100`` " +"normalizes text by Unicode NFKC (Normalization Form Compatibility " +"Composition) for Unicode version 10.0." +msgstr "" +"以下は、``NormalizerNFKC100`` の使用例です。 ``NormalizerNFKC100`` はUnicode " +"10.0用のUnicode NFKC(Normalization Form Compatibility Composition)を使って" +"テキストを正規化します。" + +msgid "Here is an example of :ref:`normalizer-nfkc100-unify-kana` option." +msgstr "以下は :ref:`normalizer-nfkc100-unify-kana` オプションの使用例です。" + +msgid "" +"This option enables that same pronounced characters in all of full-width " +"Hiragana, full-width Katakana and half-width Katakana are regarded as the " +"same character as below." +msgstr "" +"このオプションは、以下のように同じ音となる全角ひらがな、全角カタカナ、半角カ" +"タカナの文字を同一視します。" + +msgid "Here is an example of :ref:`normalizer-nfkc100-unify-kana-case` option." +msgstr "" +"以下は :ref:`normalizer-nfkc100-unify-kana-case` オプションの使用例です。" + +msgid "" +"This option enables that large and small versions of same letters in all of " +"full-width Hiragana, full-width Katakana and half-width Katakana are " +"regarded as the same character as below." +msgstr "" +"このオプションは、以下のように、全角ひらがな、全角カタカナ、半角カタカナの小" +"さな文字を大きな文字と同一視します。" + +msgid "" +"Here is an example of :ref:`normalizer-nfkc100-unify-kana-voiced-sound-mark` " +"option." +msgstr "" +"以下は、 :ref:`normalizer-nfkc100-unify-kana-voiced-sound-mark` オプションの" +"使用例です。" + +msgid "" +"This option enables that letters with/without voiced sound mark and semi " +"voiced sound mark in all of full-width Hiragana, full-width Katakana and " +"half-width Katakana are regarded as the same character as below." +msgstr "" +"このオプションは、以下のように、全角ひらがな、全角カタカナ、半角カタカナで濁" +"点や半濁点の有無を同一視します。" + +msgid "" +"Here is an example of :ref:`normalizer-nfkc100-unify-hyphen` option. This " +"option enables normalize hyphen to \"-\" (U+002D HYPHEN-MINUS) as below." +msgstr "" +"以下は、 :ref:`normalizer-nfkc100-unify-hyphen` オプションの使用例です。この" +"オプションは、以下のように、ハイフンを\"-\" (U+002D HYPHEN-MINUS)に正規化しま" +"す。" + +msgid "" +"Here is an example of :ref:`normalizer-nfkc100-unify-prolonged-sound-mark` " +"option. This option enables normalize prolonged sound to \"-\" (U+30FC " +"KATAKANA-HIRAGANA PROLONGED SOUND MARK) as below." +msgstr "" +"以下は、 :ref:`normalizer-nfkc100-unify-prolonged-sound-mark` オプションの使" +"用例です。このオプションは、以下のように長音記号を\"-\" (U+30FC KATAKANA-" +"HIRAGANA PROLONGED SOUND MARK)に正規化します。" + +msgid "" +"Here is an example of :ref:`normalizer-nfkc100-unify-hyphen-and-prolonged-" +"sound-mark` option. This option enables normalize hyphen and prolonged sound " +"to \"-\" (U+002D HYPHEN-MINUS) as below." +msgstr "" +"以下は、:ref:`normalizer-nfkc100-unify-hyphen-and-prolonged-sound-mark` オプ" +"ションの使用例です。このオプションは、以下のように、ハイフンと長音記号を\"-" +"\" (U+002D HYPHEN-MINUS)に正規化します。" + +msgid "" +"Here is an example of :ref:`normalizer-nfkc100-unify-middle-dot` option. " +"This option enables normalize middle dot to \"·\" (U+00B7 MIDDLE DOT) as " +"below." +msgstr "" +"以下は、:ref:`normalizer-nfkc100-unify-middle-dot` オプションの使用例です。こ" +"のオプションは、中点を\"·\" (U+00B7 MIDDLE DOT)に正規化します。" + +msgid "" +"Here is an example of :ref:`normalizer-nfkc100-unify-katakana-v-sounds` " +"option. This option enables normalize \"ヴァヴィヴヴェヴォ\" to \"バビブベボ" +"\" as below." +msgstr "" +"以下は、:ref:`normalizer-nfkc100-unify-katakana-v-sounds` オプションの使用例" +"です。このオプションは、以下のように、\"ヴァヴィヴヴェヴォ\"を\"バビブベボ" +"\"に正規化します。" + +msgid "" +"Here is an example of :ref:`normalizer-nfkc100-unify-katakana-bu-sounds` " +"option. This option enables normalize \"ヴァヴィヴゥヴェヴォ\" to \"ブ\" as " +"below." +msgstr "" +"以下は、:ref:`normalizer-nfkc100-unify-katakana-bu-sounds` オプションの使用例" +"です。このオプションは、以下のように、\"ヴァヴィヴゥヴェヴォ\"を\"ブ\"に正規" +"化します。" + +msgid "" +"Here is an example of :ref:`normalizer-nfkc100-unify-to-romaji` option. This " +"option enables normalize hiragana and katakana to romaji as below." +msgstr "" +"以下は、 :ref:`normalizer-nfkc100-unify-to-romaji` オプションの使用例です。こ" +"のオプションは、以下のように、ひらがなとカタカナをローマ字に正規化します。" + +msgid "Advanced usage" +msgstr "高度な使い方" + +msgid "" +"You can output romaji of specific a part of speech with using to combine " +"``TokenMecab`` and ``NormalizerNFKC100`` as below." +msgstr "" +"``TokenMecab`` と ``NormalizerNFKC100`` を組み合わせて使うことで、特定の品詞" +"の読みをローマ字で出力できます。" + +msgid "" +"First of all, you extract reading of a noun with excluding non-independent " +"word and suffix of person name with ``target_class`` option and " +"``include_reading`` option." +msgstr "" +"まずはじめに、``TokenMecab`` の ``target_class`` オプションと " +"``include_reading`` オプションを使って人名の接尾語と非自立語を除いた名詞を抽" +"出します。" + +msgid "" +"Next, you normalize reading of the noun that extracted with " +"``unify_to_romaji`` option of ``NormalizerNFKC100``." +msgstr "" +"次に、抽出した名詞の読みを ``NormalizerNFKC100`` の ``unify_to_romaji`` を" +"使って正規化します。" + +msgid "There are optional parameters as below." +msgstr "省略可能な引数は以下の通りです。" + +msgid "``unify_kana``" +msgstr "" + +msgid "" +"This option enables that same pronounced characters in all of full-width " +"Hiragana, full-width Katakana and half-width Katakana are regarded as the " +"same character." +msgstr "" +"このオプションは、同じ音となる全角ひらがな、全角カタカナ、半角カタカナの文字" +"を同一視します。" + +msgid "``unify_kana_case``" +msgstr "" + +msgid "" +"This option enables that large and small versions of same letters in all of " +"full-width Hiragana, full-width Katakana and half-width Katakana are " +"regarded as the same character." +msgstr "" +"このオプションは、全角ひらがな、全角カタカナ、半角カタカナの小さな文字を大き" +"な文字と同一視します。" + +msgid "``unify_kana_voiced_sound_mark``" +msgstr "" + +msgid "" +"This option enables that letters with/without voiced sound mark and semi " +"voiced sound mark in all of full-width Hiragana, full-width Katakana and " +"half-width Katakana are regarded as the same character." +msgstr "" +"このオプションは、全角ひらがな、全角カタカナ、半角カタカナで濁点や半濁点の有" +"無を同一視します。" + +msgid "``unify_hyphen``" +msgstr "" + +msgid "This option enables normalize hyphen to \"-\" (U+002D HYPHEN-MINUS)." +msgstr "" +"このオプションは、ハイフンを\"-\" (U+002D HYPHEN-MINUS)に正規化します。" + +msgid "Hyphen of the target of normalizing is as below." +msgstr "正規化対象のハイフンは以下の通りです。" + +msgid "\"-\" (U+002D HYPHEN-MINUS)" +msgstr "" + +msgid "\"֊\" (U+058A ARMENIAN HYPHEN)" +msgstr "" + +msgid "\"˗\" (U+02D7 MODIFIER LETTER MINUS SIGN)" +msgstr "" + +msgid "\"‐\" (U+2010 HYPHEN)" +msgstr "" + +msgid "\"—\" (U+2014 EM DASH)" +msgstr "" + +msgid "\"⁃\" (U+2043 HYPHEN BULLET)" +msgstr "" + +msgid "\"⁻\" (U+207B SUPERSCRIPT MINUS)" +msgstr "" + +msgid "\"₋\" (U+208B SUBSCRIPT MINUS)" +msgstr "" + +msgid "\"−\" (U+2212 MINUS SIGN)" +msgstr "" + +msgid "``unify_prolonged_sound_mark``" +msgstr "" + +msgid "" +"This option enables normalize prolonged sound to \"-\" (U+30FC KATAKANA-" +"HIRAGANA PROLONGED SOUND MARK)." +msgstr "" +"このオプションは、長音記号を\"-\" (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND " +"MARK)に正規化します。" + +msgid "Prolonged sound of the target of normalizing is as below." +msgstr "正規化対象の長音記号は以下の通りです。" + +msgid "\"―\" (U+2015 HORIZONTAL BAR)" +msgstr "" + +msgid "\"─\" (U+2500 BOX DRAWINGS LIGHT HORIZONTAL)" +msgstr "" + +msgid "\"━\" (U+2501 BOX DRAWINGS HEAVY HORIZONTAL)" +msgstr "" + +msgid "\"ー\" (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK)" +msgstr "" + +msgid "\"ー\" (U+FF70 HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK)" +msgstr "" + +msgid "``unify_hyphen_and_prolonged_sound_mark``" +msgstr "" + +msgid "" +"This option enables normalize hyphen and prolonged sound to \"-\" (U+002D " +"HYPHEN-MINUS)." +msgstr "" +"このオプションは、ハイフンと長音記号を\"-\" (U+002D HYPHEN-MINUS)に正規化しま" +"す。" + +msgid "Hyphen and prolonged sound of the target normalizing is below." +msgstr "正規化対象のハイフンと長音記号は以下の通りです。" + +msgid "``unify_middle_dot``" +msgstr "" + +msgid "This option enables normalize middle dot to \"·\" (U+00B7 MIDDLE DOT)." +msgstr "このオプションは、中点を\"·\" (U+00B7 MIDDLE DOT)に正規化します。" + +msgid "Middle dot of the target of normalizing is as below." +msgstr "正規化対象の中点は以下の通りです。" + +msgid "\"·\" (U+00B7 MIDDLE DOT)" +msgstr "" + +msgid "\"ᐧ\" (U+1427 CANADIAN SYLLABICS FINAL MIDDLE DOT)" +msgstr "" + +msgid "\"•\" (U+2022 BULLET)" +msgstr "" + +msgid "\"∙\" (U+2219 BULLET OPERATOR)" +msgstr "" + +msgid "\"⋅\" (U+22C5 DOT OPERATOR)" +msgstr "" + +msgid "\"⸱\" (U+2E31 WORD SEPARATOR MIDDLE DOT)" +msgstr "" + +msgid "\"・\" (U+30FB KATAKANA MIDDLE DOT)" +msgstr "" + +msgid "\"・\" (U+FF65 HALFWIDTH KATAKANA MIDDLE DOT)" +msgstr "" + +msgid "``unify_katakana_v_sounds``" +msgstr "" + +msgid "This option enables normalize \"ヴァヴィヴヴェヴォ\" to \"バビブベボ\"." +msgstr "" +"このオプションは、\"ヴァヴィヴヴェヴォ\"を\"バビブベボ\"に正規化します。" + +msgid "``unify_katakana_bu_sound``" +msgstr "" + +msgid "This option enables normalize \"ヴァヴィヴゥヴェヴォ\" to \"ブ\"." +msgstr "このオプションは、\"ヴァヴィヴゥヴェヴォ\"を\"ブ\"に正規化します。" + +msgid "``unify_to_romaji``" +msgstr "" + +msgid "This option enables normalize hiragana and katakana to romaji." +msgstr "このオプションは、ひらがなとカタカナをローマ字に正規化します。" + +msgid ":doc:`../commands/normalize`" +msgstr "" + msgid "Operations" msgstr "操作方法" @@ -26691,9 +27036,6 @@ msgstr "" "以下のように、 ``TokenFilterNFKC100`` はひらがなと漢字のトークンは変換しませ" "ん。" -msgid "Advanced usage" -msgstr "高度な使い方" - msgid "" "You can output all input string as hiragana with cimbining " "``TokenFilterNFKC100`` with ``use_reading`` option of ``TokenMecab`` as " @@ -26705,9 +27047,6 @@ msgstr "" msgid "There are a required parameters ``unify_kana``." msgstr "必須の引数 ``unify_kana`` があります。" -msgid "``unify_kana``" -msgstr "" - msgid "Translate a token katakana to hiragana." msgstr "カタカナのトークンをひらがなに変換します。" @@ -28334,12 +28673,11 @@ msgstr "" msgid "``window_sum``" msgstr "" -#~ msgid "" -#~ ":ref:`token-bigram` uses 2 characters per token. ``TokenTrigram`` uses 3 " -#~ "characters per token as below example." -#~ msgstr "" -#~ ":ref:`token-bigram` は各トークンが2文字ですが、以下の例のように " -#~ "``TokenTrigram`` は各トークンが3文字です。" +#~ msgid "unify_middle_dot is added." +#~ msgstr "unify_middle_dot 追加。" + +#~ msgid "for Unicode version 10.0." +#~ msgstr "Unicode 10.0用" #~ msgid "" #~ "``TokenTrigram`` is similar to :ref:`token-bigram`. The differences " @@ -28349,3 +28687,28 @@ msgstr "" #~ "``TokenTrigram`` は :ref:`token-bigram` に似ています。違いはトークンの単位" #~ "です。 :ref:`token-bigram` は各トークンが2文字ですが、 ``TokenTrigram`` は" #~ "各トークンが3文字です。" + +#~ msgid "" +#~ ":ref:`token-bigram` uses 2 characters per token. ``TokenTrigram`` uses 3 " +#~ "characters per token as below example." +#~ msgstr "" +#~ ":ref:`token-bigram` は各トークンが2文字ですが、以下の例のように " +#~ "``TokenTrigram`` は各トークンが3文字です。" + +#~ msgid "" +#~ ":ref:`normalizeser-nfkc100-unify-middle-dot` is added. :ref:`normalizeser-" +#~ "nfkc100-unify-katakana-v-sounds` is added. :ref:`normalizeser-nfkc100-" +#~ "unify-katakana-bu-sounds` is added." +#~ msgstr "" +#~ ":ref:`normalizeser-nfkc100-unify-middle-dot` 追加。 :ref:`normalizeser-" +#~ "nfkc100-unify-katakana-v-sounds` 追加。 :ref:`normalizeser-nfkc100-unify-" +#~ "katakana-bu-sounds` 追加。" + +#~ msgid "unify_to_romaji is added." +#~ msgstr "unify_to_romaji 追加。" + +#~ msgid "unify_katakana_bu_sounds is added." +#~ msgstr "unify_katakana_bu_sounds 追加。" + +#~ msgid "unify_katakana_v_sounds is added." +#~ msgstr "unify_katakana_v_sounds 追加。" Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-hyphen-and-prolonged-sound-mark.log (+38 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-hyphen-and-prolonged-sound-mark.log 2019-01-07 16:34:50 +0900 (0640b9ce9) @@ -0,0 +1,38 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_hyphen_and_prolonged_sound_mark", true)' "-˗֊‐‑‒–⁃⁻₋− ﹣- ー—―─━ー" WITH_TYPES + # [ + # [ + # 0, + # 1546840930.462605, + # 0.0001947879791259766 + # ], + # { + # "normalized": "----------- -- ------", + # "types": [ + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "others", + # "symbol", + # "symbol", + # "others", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-hyphen.log (+28 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-hyphen.log 2019-01-07 16:34:50 +0900 (5641bbcd1) @@ -0,0 +1,28 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_hyphen", true)' "-˗֊‐‑‒–⁃⁻₋−" WITH_TYPES + # [ + # [ + # 0, + # 1546840778.422051, + # 0.0001845359802246094 + # ], + # { + # "normalized": "-----------", + # "types": [ + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-kana-case-hiragana.log (+39 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-kana-case-hiragana.log 2019-01-07 16:34:50 +0900 (11649f8b5) @@ -0,0 +1,39 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_kana_case", true)' "ぁあぃいぅうぇえぉおゃやゅゆょよゎわゕかゖけ" WITH_TYPES + # [ + # [ + # 0, + # 1546840389.78734, + # 0.0001950263977050781 + # ], + # { + # "normalized": "ああいいううええおおややゆゆよよわわかかけけ", + # "types": [ + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-kana-case-katakana.log (+39 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-kana-case-katakana.log 2019-01-07 16:34:50 +0900 (b898a5e0f) @@ -0,0 +1,39 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_kana_case", true)' "ァアィイゥウェエォオャヤュユョヨヮワヵカヶケ" WITH_TYPES + # [ + # [ + # 0, + # 1546840469.179984, + # 0.0002634525299072266 + # ], + # { + # "normalized": "アアイイウウエエオオヤヤユユヨヨワワカカケケ", + # "types": [ + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-kana.log (+24 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-kana.log 2019-01-07 16:34:50 +0900 (717c40217) @@ -0,0 +1,24 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_kana", true)' "あイウェおヽヾ" WITH_TYPES + # [ + # [ + # 0, + # 1546840210.809296, + # 0.0002298355102539062 + # ], + # { + # "normalized": "あいうぇおゝゞ", + # "types": [ + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-katakana-bu-sounds.log (+23 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-katakana-bu-sounds.log 2019-01-07 16:34:50 +0900 (97d8f74a5) @@ -0,0 +1,23 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_katakana_bu_sound", true)' "ヴァヴィヴヴェヴォヴ" WITH_TYPES + # [ + # [ + # 0, + # 1546841138.543078, + # 0.0001876354217529297 + # ], + # { + # "normalized": "ブブブブブブ", + # "types": [ + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-katakana-v-sounds.log (+23 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-katakana-v-sounds.log 2019-01-07 16:34:50 +0900 (3b74a4538) @@ -0,0 +1,23 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_katakana_v_sounds", true)' "ヴァヴィヴヴェヴォヴ" WITH_TYPES + # [ + # [ + # 0, + # 1546841068.702912, + # 0.0002088546752929688 + # ], + # { + # "normalized": "バビブベボブ", + # "types": [ + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-middle-dot.log (+25 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-middle-dot.log 2019-01-07 16:34:50 +0900 (d85ce5e7f) @@ -0,0 +1,25 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_middle_dot", true)' "·ᐧ•∙⋅⸱・・" WITH_TYPES + # [ + # [ + # 0, + # 1546840999.582769, + # 0.0001835823059082031 + # ], + # { + # "normalized": "········", + # "types": [ + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol", + # "symbol" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-prolonged-sound-mark.log (+23 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-prolonged-sound-mark.log 2019-01-07 16:34:50 +0900 (7a9d4a793) @@ -0,0 +1,23 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_prolonged_sound_mark", true)' "ー—―─━ー" WITH_TYPES + # [ + # [ + # 0, + # 1546840846.654316, + # 0.0001988410949707031 + # ], + # { + # "normalized": "ーーーーーー", + # "types": [ + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-to-romaji-complex.log (+99 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-to-romaji-complex.log 2019-01-07 16:34:50 +0900 (100b2b055) @@ -0,0 +1,99 @@ +Execution example:: + + tokenize 'TokenMecab("target_class", "-名詞/非自立", "target_class", "-名詞/接尾/人名", "target_class", "名詞", "include_reading", true)' '彼の名前は山田さんのはずです。' + # [ + # [ + # 0, + # 1546841272.495518, + # 0.0003752708435058594 + # ], + # [ + # { + # "value": "彼", + # "position": 0, + # "force_prefix": false, + # "force_prefix_search": false, + # "metadata": { + # "reading": "カレ" + # } + # }, + # { + # "value": "名前", + # "position": 1, + # "force_prefix": false, + # "force_prefix_search": false, + # "metadata": { + # "reading": "ナマエ" + # } + # }, + # { + # "value": "山田", + # "position": 2, + # "force_prefix": false, + # "force_prefix_search": false, + # "metadata": { + # "reading": "ヤマダ" + # } + # } + # ] + # ] + normalize 'NormalizerNFKC100("unify_to_romaji", true)' "カレ" WITH_TYPES + # [ + # [ + # 0, + # 1546841303.223331, + # 0.000186920166015625 + # ], + # { + # "normalized": "kare", + # "types": [ + # "alpha", + # "alpha", + # "alpha", + # "alpha" + # ], + # "checks": [ + # ] + # } + # ] + normalize 'NormalizerNFKC100("unify_to_romaji", true)' "ナマエ" WITH_TYPES + # [ + # [ + # 0, + # 1546841329.839442, + # 0.0001835823059082031 + # ], + # { + # "normalized": "namae", + # "types": [ + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha" + # ], + # "checks": [ + # ] + # } + # ] + normalize 'NormalizerNFKC100("unify_to_romaji", true)' "ヤマダ" WITH_TYPES + # [ + # [ + # 0, + # 1546841358.479471, + # 0.0001850128173828125 + # ], + # { + # "normalized": "yamada", + # "types": [ + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-to-romaji.log (+32 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-to-romaji.log 2019-01-07 16:34:50 +0900 (cbac1e106) @@ -0,0 +1,32 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_to_romaji", true)' "アァイィウゥエェオォ" WITH_TYPES + # [ + # [ + # 0, + # 1546841200.15132, + # 0.0001931190490722656 + # ], + # { + # "normalized": "axaixiuxuexeoxo", + # "types": [ + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha", + # "alpha" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-voiced-sound-mark-hiragana.log (+62 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-voiced-sound-mark-hiragana.log 2019-01-07 16:34:50 +0900 (48d8be99d) @@ -0,0 +1,62 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_kana_voiced_sound_mark", true)' "かがきぎくぐけげこごさざしじすずせぜそぞただちぢつづてでとどはばぱひびぴふぶぷへべぺほぼぽ" WITH_TYPES + # [ + # [ + # 0, + # 1546840544.908633, + # 0.0002522468566894531 + # ], + # { + # "normalized": "かかききくくけけここささししすすせせそそたたちちつつててととはははひひひふふふへへへほほほ", + # "types": [ + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana", + # "hiragana" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100-unify-voiced-sound-mark-katakana.log (+62 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100-unify-voiced-sound-mark-katakana.log 2019-01-07 16:34:50 +0900 (40abe4ed2) @@ -0,0 +1,62 @@ +Execution example:: + + normalize 'NormalizerNFKC100("unify_kana_voiced_sound_mark", true)' "カガキギクグケゲコゴサザシジスズセゼソゾタダチヂツヅテデトドハバパヒビピフブプヘベペホボポ" WITH_TYPES + # [ + # [ + # 0, + # 1546840704.477687, + # 0.0002183914184570312 + # ], + # { + # "normalized": "カカキキククケケココササシシススセセソソタタチチツツテテトトハハハヒヒヒフフフヘヘヘホホホ", + # "types": [ + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana", + # "katakana" + # ], + # "checks": [ + # ] + # } + # ] Added: doc/source/example/reference/normalizers/normalizer-nfkc100.log (+18 -0) 100644 =================================================================== --- /dev/null +++ doc/source/example/reference/normalizers/normalizer-nfkc100.log 2019-01-07 16:34:50 +0900 (52ef7c4aa) @@ -0,0 +1,18 @@ +Execution example:: + + normalize NormalizerNFKC100 "©" WITH_TYPES + # [ + # [ + # 0, + # 1546840077.247264, + # 0.0001680850982666016 + # ], + # { + # "normalized": "©", + # "types": [ + # "emoji" + # ], + # "checks": [ + # ] + # } + # ] Modified: doc/source/reference/normalizers.rst (+6 -0) =================================================================== --- doc/source/reference/normalizers.rst 2019-01-05 17:28:03 +0900 (99b42d252) +++ doc/source/reference/normalizers.rst 2019-01-07 16:34:50 +0900 (4428abead) @@ -73,6 +73,12 @@ Here is a list of built-in normalizers: * ``NormalizerAuto`` * ``NormalizerNFKC51`` +.. toctree:: + :maxdepth: 1 + :glob: + + normalizers/* + .. _normalizer-auto: ``NormalizerAuto`` Added: doc/source/reference/normalizers/normalizer_nfkc100.rst (+327 -0) 100644 =================================================================== --- /dev/null +++ doc/source/reference/normalizers/normalizer_nfkc100.rst 2019-01-07 16:34:50 +0900 (1f605b396) @@ -0,0 +1,327 @@ +.. -*- rst -*- + +.. highlightlang:: none + +.. groonga-command +.. database: normalisers + +.. _normalizer-nfkc100: + +``NormalizerNFKC100`` +===================== + +Summary +------- + +.. versionadded:: 8.0.2 + +``NormalizerNFKC100`` normalizes text by Unicode NFKC (Normalization Form Compatibility Composition) +for Unicode version 10.0. + +This normalizer can change behavior by specifying options. + +Syntax +------ + +``NormalizerNFKC100`` has optional parameter:: + +No options:: + + NormalizerNFKC100 + +``NormalizerNFKC100`` normalizes text by Unicode NFKC (Normalization Form Compatibility Composition) +for Unicode version 10.0. + +Specify option:: + + NormalizerNFKC100("unify_kana", true) + + NormalizerNFKC100("unify_kana_case", true) + + NormalizerNFKC100("unify_kana_voiced_sound_mark", true) + + NormalizerNFKC100("unify_hyphen", true) + + NormalizerNFKC100("unify_prolonged_sound_mark", true) + + NormalizerNFKC100("unify_hyphen_and_prolonged_sound_mark", true) + + NormalizerNFKC100("unify_middle_dot", true) + + NormalizerNFKC100("unify_katakana_v_sounds", true) + + NormalizerNFKC100("unify_katakana_bu_sound", true) + + NormalizerNFKC100("unify_to_romaji", true) + +.. versionadded:: 8.0.3 + + :ref:`normalizer-nfkc100-unify-middle-dot` is added. + + :ref:`normalizer-nfkc100-unify-katakana-v-sounds` is added. + + :ref:`normalizer-nfkc100-unify-katakana-bu-sounds` is added. + +.. versionadded:: 8.0.9 + + :ref:`normalizer-nfkc100-unify-to-romaji` is added. + +Specify multiple options:: + + NormalizerNFKC100("unify_to_romaji", true, "unify_kana_case", true, "unify_hyphen_and_prolonged_sound_mark", true) + +``NormalizerNFKC100`` also specify multiple options as above. +You can also specify mingle multiple options except above example. + +Usage +----- + +Simple usage +------------ + +Here is an example of ``NormalizerNFKC100``. ``NormalizerNFKC100`` normalizes text by Unicode NFKC (Normalization Form Compatibility Composition) for Unicode version 10.0. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100.log +.. normalize NormalizerNFKC100 "©" WITH_TYPES + +Here is an example of :ref:`normalizer-nfkc100-unify-kana` option. + +This option enables that same pronounced characters in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character as below. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-kana.log +.. normalize 'NormalizerNFKC100("unify_kana", true)' "あイウェおヽヾ" WITH_TYPES + +Here is an example of :ref:`normalizer-nfkc100-unify-kana-case` option. + +This option enables that large and small versions of same letters in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character as below. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-kana-case-hiragana.log +.. normalize 'NormalizerNFKC100("unify_kana_case", true)' "ぁあぃいぅうぇえぉおゃやゅゆょよゎわゕかゖけ" WITH_TYPES + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-kana-case-katakana.log +.. normalize 'NormalizerNFKC100("unify_kana_case", true)' "ァアィイゥウェエォオャヤュユョヨヮワヵカヶケ" WITH_TYPES + +Here is an example of :ref:`normalizer-nfkc100-unify-kana-voiced-sound-mark` option. + +This option enables that letters with/without voiced sound mark and semi voiced sound mark in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character as below. + + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-voiced-sound-mark-hiragana.log +.. normalize 'NormalizerNFKC100("unify_kana_voiced_sound_mark", true)' "かがきぎくぐけげこごさざしじすずせぜそぞただちぢつづてでとどはばぱひびぴふぶぷへべぺほぼぽ" WITH_TYPES + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-voiced-sound-mark-katakana.log +.. normalize 'NormalizerNFKC100("unify_kana_voiced_sound_mark", true)' "カガキギクグケゲコゴサザシジスズセゼソゾタダチヂツヅテデトドハバパヒビピフブプヘベペホボポ" WITH_TYPES + +Here is an example of :ref:`normalizer-nfkc100-unify-hyphen` option. +This option enables normalize hyphen to "-" (U+002D HYPHEN-MINUS) as below. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-hyphen.log +.. normalize 'NormalizerNFKC100("unify_hyphen", true)' "-˗֊‐‑‒–⁃⁻₋−" WITH_TYPES + +Here is an example of :ref:`normalizer-nfkc100-unify-prolonged-sound-mark` option. +This option enables normalize prolonged sound to "-" (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK) as below. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-prolonged-sound-mark.log +.. normalize 'NormalizerNFKC100("unify_prolonged_sound_mark", true)' "ー—―─━ー" WITH_TYPES + +Here is an example of :ref:`normalizer-nfkc100-unify-hyphen-and-prolonged-sound-mark` option. +This option enables normalize hyphen and prolonged sound to "-" (U+002D HYPHEN-MINUS) as below. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-hyphen-and-prolonged-sound-mark.log +.. normalize 'NormalizerNFKC100("unify_hyphen_and_prolonged_sound_mark", true)' "-˗֊‐‑‒–⁃⁻₋− ﹣- ー—―─━ー" WITH_TYPES + +Here is an example of :ref:`normalizer-nfkc100-unify-middle-dot` option. +This option enables normalize middle dot to "·" (U+00B7 MIDDLE DOT) as below. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-middle-dot.log +.. normalize 'NormalizerNFKC100("unify_middle_dot", true)' "·ᐧ•∙⋅⸱・・" WITH_TYPES + +Here is an example of :ref:`normalizer-nfkc100-unify-katakana-v-sounds` option. +This option enables normalize "ヴァヴィヴヴェヴォ" to "バビブベボ" as below. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-katakana-v-sounds.log +.. normalize 'NormalizerNFKC100("unify_katakana_v_sounds", true)' "ヴァヴィヴヴェヴォヴ" WITH_TYPES + +Here is an example of :ref:`normalizer-nfkc100-unify-katakana-bu-sounds` option. +This option enables normalize "ヴァヴィヴゥヴェヴォ" to "ブ" as below. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-katakana-bu-sounds.log +.. normalize 'NormalizerNFKC100("unify_katakana_bu_sound", true)' "ヴァヴィヴヴェヴォヴ" WITH_TYPES + +Here is an example of :ref:`normalizer-nfkc100-unify-to-romaji` option. +This option enables normalize hiragana and katakana to romaji as below. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-to-romaji.log +.. normalize 'NormalizerNFKC100("unify_to_romaji", true)' "アァイィウゥエェオォ" WITH_TYPES + +Advanced usage +-------------- + +You can output romaji of specific a part of speech with using to combine +``TokenMecab`` and ``NormalizerNFKC100`` as below. + +First of all, you extract reading of a noun with excluding non-independent word and suffix of person name with ``target_class`` option and ``include_reading`` option. + +Next, you normalize reading of the noun that extracted with ``unify_to_romaji`` option of ``NormalizerNFKC100``. + +.. groonga-command +.. include:: ../../example/reference/normalizers/normalizer-nfkc100-unify-to-romaji-complex.log +.. tokenize 'TokenMecab("target_class", "-名詞/非自立", "target_class", "-名詞/接尾/人名", "target_class", "名詞", "include_reading", true)' '彼の名前は山田さんのはずです。' +.. normalize 'NormalizerNFKC100("unify_to_romaji", true)' "カレ" WITH_TYPES +.. normalize 'NormalizerNFKC100("unify_to_romaji", true)' "ナマエ" WITH_TYPES +.. normalize 'NormalizerNFKC100("unify_to_romaji", true)' "ヤマダ" WITH_TYPES + +Parameters +---------- + +Optional parameter +^^^^^^^^^^^^^^^^^^ + +There are optional parameters as below. + +.. _normalizer-nfkc100-unify-kana: + +``unify_kana`` +"""""""""""""" + +This option enables that same pronounced characters in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character. + +.. _normalizer-nfkc100-unify-kana-case: + +``unify_kana_case`` +""""""""""""""""""" + +This option enables that large and small versions of same letters in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character. + +.. _normalizer-nfkc100-unify-kana-voiced-sound-mark: + +``unify_kana_voiced_sound_mark`` +"""""""""""""""""""""""""""""""" + +This option enables that letters with/without voiced sound mark and semi voiced sound mark in all of full-width Hiragana, full-width Katakana and half-width Katakana are regarded as the same character. + +.. _normalizer-nfkc100-unify-hyphen: + +``unify_hyphen`` +"""""""""""""""" + +This option enables normalize hyphen to "-" (U+002D HYPHEN-MINUS). + +Hyphen of the target of normalizing is as below. + +* "-" (U+002D HYPHEN-MINUS) +* "֊" (U+058A ARMENIAN HYPHEN) +* "˗" (U+02D7 MODIFIER LETTER MINUS SIGN) +* "‐" (U+2010 HYPHEN) +* "—" (U+2014 EM DASH) +* "⁃" (U+2043 HYPHEN BULLET) +* "⁻" (U+207B SUPERSCRIPT MINUS) +* "₋" (U+208B SUBSCRIPT MINUS) +* "−" (U+2212 MINUS SIGN) + +.. _normalizer-nfkc100-unify-prolonged-sound-mark: + +``unify_prolonged_sound_mark`` +"""""""""""""""""""""""""""""" + +This option enables normalize prolonged sound to "-" (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK). + +Prolonged sound of the target of normalizing is as below. + +* "—" (U+2014 EM DASH) +* "―" (U+2015 HORIZONTAL BAR) +* "─" (U+2500 BOX DRAWINGS LIGHT HORIZONTAL) +* "━" (U+2501 BOX DRAWINGS HEAVY HORIZONTAL) +* "ー" (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK) +* "ー" (U+FF70 HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK) + +.. _normalizer-nfkc100-unify-hyphen-and-prolonged-sound-mark: + +``unify_hyphen_and_prolonged_sound_mark`` +""""""""""""""""""""""""""""""""""""""""" + +This option enables normalize hyphen and prolonged sound to "-" (U+002D HYPHEN-MINUS). + +Hyphen and prolonged sound of the target normalizing is below. + +* "-" (U+002D HYPHEN-MINUS) +* "֊" (U+058A ARMENIAN HYPHEN) +* "˗" (U+02D7 MODIFIER LETTER MINUS SIGN) +* "‐" (U+2010 HYPHEN) +* "—" (U+2014 EM DASH) +* "⁃" (U+2043 HYPHEN BULLET) +* "⁻" (U+207B SUPERSCRIPT MINUS) +* "₋" (U+208B SUBSCRIPT MINUS) +* "−" (U+2212 MINUS SIGN) + +* "—" (U+2014 EM DASH) +* "―" (U+2015 HORIZONTAL BAR) +* "─" (U+2500 BOX DRAWINGS LIGHT HORIZONTAL) +* "━" (U+2501 BOX DRAWINGS HEAVY HORIZONTAL) +* "ー" (U+30FC KATAKANA-HIRAGANA PROLONGED SOUND MARK) +* "ー" (U+FF70 HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK) + +.. _normalizer-nfkc100-unify-middle-dot: + +``unify_middle_dot`` +"""""""""""""""""""" + +.. versionadded:: 8.0.3 + +This option enables normalize middle dot to "·" (U+00B7 MIDDLE DOT). + +Middle dot of the target of normalizing is as below. + +* "·" (U+00B7 MIDDLE DOT) +* "ᐧ" (U+1427 CANADIAN SYLLABICS FINAL MIDDLE DOT) +* "•" (U+2022 BULLET) +* "∙" (U+2219 BULLET OPERATOR) +* "⋅" (U+22C5 DOT OPERATOR) +* "⸱" (U+2E31 WORD SEPARATOR MIDDLE DOT) +* "・" (U+30FB KATAKANA MIDDLE DOT) +* "・" (U+FF65 HALFWIDTH KATAKANA MIDDLE DOT) + +.. _normalizer-nfkc100-unify-katakana-v-sounds: + +``unify_katakana_v_sounds`` +""""""""""""""""""""""""""" + +.. versionadded:: 8.0.3 + +This option enables normalize "ヴァヴィヴヴェヴォ" to "バビブベボ". + +.. _normalizer-nfkc100-unify-katakana-bu-sounds: + +``unify_katakana_bu_sound`` +""""""""""""""""""""""""""" + +.. versionadded:: 8.0.3 + +This option enables normalize "ヴァヴィヴゥヴェヴォ" to "ブ". + +.. _normalizer-nfkc100-unify-to-romaji: + +``unify_to_romaji`` +""""""""""""""""""" + +.. versionadded:: 8.0.9 + +This option enables normalize hiragana and katakana to romaji. + +See also +---------- + +* :doc:`../commands/normalize` -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20190107/d5d03d42/attachment-0001.html>