[Groonga-commit] groonga/groonga at 5c0c0ae [master] Ignore an empty token from tokenizer

Back to archive index

Kouhei Sutou null+****@clear*****
Sun Sep 15 15:39:06 JST 2013


Kouhei Sutou	2013-09-15 15:39:06 +0900 (Sun, 15 Sep 2013)

  New Revision: 5c0c0aef49b418f13d8bc2c2944a28239f82fb4f
  https://github.com/groonga/groonga/commit/5c0c0aef49b418f13d8bc2c2944a28239f82fb4f

  Message:
    Ignore an empty token from tokenizer
    
    [groonga-dev,01729]
    
    Suggested by Naoya Murakami. Thanks!!!

  Added files:
    test/command/suite/tokenizers/delimit/invalid/empty.expected
    test/command/suite/tokenizers/delimit/invalid/empty.test
  Modified files:
    lib/token.c

  Modified: lib/token.c (+4 -0)
===================================================================
--- lib/token.c    2013-09-14 15:44:20 +0900 (ef759c6)
+++ lib/token.c    2013-09-15 15:39:06 +0900 (bbd688a)
@@ -566,6 +566,10 @@ grn_token_next(grn_ctx *ctx, grn_token *token)
                         (status & GRN_TOKENIZER_TOKEN_REACH_END)))
         ? GRN_TOKEN_DONE : GRN_TOKEN_DOING;
       token->force_prefix = 0;
+      if (token->curr_size == 0) {
+        GRN_LOG(ctx, GRN_WARN, "[token_next] ignore an empty token.");
+        continue;
+      }
       if (token->curr_size > GRN_TABLE_MAX_KEY_SIZE) {
         GRN_LOG(ctx, GRN_WARN,
                 "[token_next] ignore too long token. "

  Added: test/command/suite/tokenizers/delimit/invalid/empty.expected (+2 -0) 100644
===================================================================
--- /dev/null
+++ test/command/suite/tokenizers/delimit/invalid/empty.expected    2013-09-15 15:39:06 +0900 (96d13e2)
@@ -0,0 +1,2 @@
+tokenize TokenDelimit "A  B"
+[[0,0.0,0.0],[{"value":"A","position":0},{"value":"B","position":1}]]

  Added: test/command/suite/tokenizers/delimit/invalid/empty.test (+1 -0) 100644
===================================================================
--- /dev/null
+++ test/command/suite/tokenizers/delimit/invalid/empty.test    2013-09-15 15:39:06 +0900 (f765566)
@@ -0,0 +1 @@
+tokenize TokenDelimit "A  B"
-------------- next part --------------
HTML����������������������������...
Download 



More information about the Groonga-commit mailing list
Back to archive index