[Groonga-commit] ranguba/chupa-text-decomposer-pdf at 2465724 [master] Support content based PDF detection for no extension case

Back to archive index

Kouhei Sutou null+****@clear*****
Wed Jul 5 14:16:23 JST 2017


Kouhei Sutou	2017-07-05 14:16:23 +0900 (Wed, 05 Jul 2017)

  New Revision: 2465724d0cf0e2e9fbaf15530fa3db75e8ef70e8
  https://github.com/ranguba/chupa-text-decomposer-pdf/commit/2465724d0cf0e2e9fbaf15530fa3db75e8ef70e8

  Message:
    Support content based PDF detection for no extension case

  Modified files:
    lib/chupa-text/decomposers/pdf.rb

  Modified: lib/chupa-text/decomposers/pdf.rb (+10 -2)
===================================================================
--- lib/chupa-text/decomposers/pdf.rb    2017-05-02 13:01:21 +0900 (cfd86ed)
+++ lib/chupa-text/decomposers/pdf.rb    2017-07-05 14:16:23 +0900 (e2e6403)
@@ -24,8 +24,16 @@ module ChupaText
       registry.register("pdf", self)
 
       def target?(data)
-        (data.extension == "pdf" and data.body.start_with?("%PDF-1")) or
-          data.mime_type == "application/pdf"
+        return true if data.mime_type == "application/pdf"
+
+        return false if data.body.nil?
+
+        case data.extension
+        when nil, "pdf"
+          data.body.start_with?("%PDF-1")
+        else
+          false
+        end
       end
 
       def decompose(data)
-------------- next part --------------
HTML����������������������������...
Download 



More information about the Groonga-commit mailing list
Back to archive index