Kouhei Sutou
null+****@clear*****
Sun Feb 10 21:36:51 JST 2013
Kouhei Sutou 2013-02-10 21:36:51 +0900 (Sun, 10 Feb 2013) New Revision: c983be73feb5ace8eb8a33a49de6b0f3b1fdd815 https://github.com/groonga/groonga-normalizer-mysql/commit/c983be73feb5ace8eb8a33a49de6b0f3b1fdd815 Log: Check whether bytesize of expanded chracter is larger than one of the original Modified files: tool/dump-difference-uca.rb Modified: tool/dump-difference-uca.rb (+9 -0) =================================================================== --- tool/dump-difference-uca.rb 2013-02-10 21:10:35 +0900 (5ea50b6) +++ tool/dump-difference-uca.rb 2013-02-10 21:36:51 +0900 (45e60c2) @@ -26,9 +26,17 @@ parser = CTypeUCAParser.new parser.parse(ARGF) n_idencials = 0 +n_expanded_characters = 0 parser.weight_based_characters.each do |weight, characters| next if characters.size == 1 n_idencials += 1 + representative_character = characters.first + rest_characters = characters[1..-1] + rest_characters.each do |character| + if representative_character[:utf8].bytesize > character[:utf8].bytesize + n_expanded_characters += 1 + end + end formatted_weight = weight.collect {|component| '%#07x' % component}.join(', ') puts "weight: #{formatted_weight}" characters.each do |character| @@ -39,3 +47,4 @@ parser.weight_based_characters.each do |weight, characters| end puts "Number of idencial weights #{n_idencials}" +puts "Number of expanded characters: #{n_expanded_characters}" -------------- next part -------------- HTML����������������������������...Download