[Groonga-commit] groonga/groonga-normalizer-mysql [master] Check whether bytesize of expanded chracter is larger than one of the original

Back to archive index

Kouhei Sutou null+****@clear*****
Sun Feb 10 21:36:51 JST 2013


Kouhei Sutou	2013-02-10 21:36:51 +0900 (Sun, 10 Feb 2013)

  New Revision: c983be73feb5ace8eb8a33a49de6b0f3b1fdd815
  https://github.com/groonga/groonga-normalizer-mysql/commit/c983be73feb5ace8eb8a33a49de6b0f3b1fdd815

  Log:
    Check whether bytesize of expanded chracter is larger than one of the original

  Modified files:
    tool/dump-difference-uca.rb

  Modified: tool/dump-difference-uca.rb (+9 -0)
===================================================================
--- tool/dump-difference-uca.rb    2013-02-10 21:10:35 +0900 (5ea50b6)
+++ tool/dump-difference-uca.rb    2013-02-10 21:36:51 +0900 (45e60c2)
@@ -26,9 +26,17 @@ parser = CTypeUCAParser.new
 parser.parse(ARGF)
 
 n_idencials = 0
+n_expanded_characters = 0
 parser.weight_based_characters.each do |weight, characters|
   next if characters.size == 1
   n_idencials += 1
+  representative_character = characters.first
+  rest_characters = characters[1..-1]
+  rest_characters.each do |character|
+    if representative_character[:utf8].bytesize > character[:utf8].bytesize
+      n_expanded_characters += 1
+    end
+  end
   formatted_weight = weight.collect {|component| '%#07x' % component}.join(', ')
   puts "weight: #{formatted_weight}"
   characters.each do |character|
@@ -39,3 +47,4 @@ parser.weight_based_characters.each do |weight, characters|
 end
 
 puts "Number of idencial weights #{n_idencials}"
+puts "Number of expanded characters: #{n_expanded_characters}"
-------------- next part --------------
HTML����������������������������...
Download 



More information about the Groonga-commit mailing list
Back to archive index