[Groonga-commit] ranguba/chupa-text-decomposer-html at 3bc8173 [master] Scrub invalid characters

Back to archive index

Kouhei Sutou null+****@clear*****
Thu Mar 2 00:10:27 JST 2017


Kouhei Sutou	2017-03-02 00:10:27 +0900 (Thu, 02 Mar 2017)

  New Revision: 3bc8173a513a102005b3761fa4da5816ff2bc865
  https://github.com/ranguba/chupa-text-decomposer-html/commit/3bc8173a513a102005b3761fa4da5816ff2bc865

  Message:
    Scrub invalid characters

  Modified files:
    lib/chupa-text/decomposers/html.rb

  Modified: lib/chupa-text/decomposers/html.rb (+1 -1)
===================================================================
--- lib/chupa-text/decomposers/html.rb    2017-03-02 00:03:41 +0900 (42d1bef)
+++ lib/chupa-text/decomposers/html.rb    2017-03-02 00:10:27 +0900 (3b0095c)
@@ -37,7 +37,7 @@ module ChupaText
         doc = Nokogiri::HTML.parse(html, nil, guess_encoding(html))
         body_element = (doc % "body")
         if body_element
-          body = body_element.text.gsub(/^\s+|\s+$/, '')
+          body = body_element.text.scrub.gsub(/^\s+|\s+$/, '')
         else
           body = ""
         end
-------------- next part --------------
HTML����������������������������...
Download 



More information about the Groonga-commit mailing list
Back to archive index