[Groonga-commit] groonga/groonga at 97ed5ec [master] mecab: ignore trailing space only token

Back to archive index

Kouhei Sutou null+****@clear*****
Sat Feb 28 20:10:45 JST 2015


Kouhei Sutou	2015-02-28 20:10:45 +0900 (Sat, 28 Feb 2015)

  New Revision: 97ed5ec079ebebc67125741590c248b524b0e9d7
  https://github.com/groonga/groonga/commit/97ed5ec079ebebc67125741590c248b524b0e9d7

  Message:
    mecab: ignore trailing space only token

  Added files:
    test/command/suite/tokenizers/mecab/full_width_space/last.expected
    test/command/suite/tokenizers/mecab/full_width_space/last.test
  Modified files:
    plugins/tokenizers/mecab.c

  Modified: plugins/tokenizers/mecab.c (+1 -1)
===================================================================
--- plugins/tokenizers/mecab.c    2015-02-28 20:02:09 +0900 (71cc950)
+++ plugins/tokenizers/mecab.c    2015-02-28 20:10:45 +0900 (7b3d59c)
@@ -247,7 +247,7 @@ mecab_next(grn_ctx *ctx, int nargs, grn_obj **args, grn_user_data *user_data)
       }
     }
 
-    if (r == e) {
+    if (r == e || tokenizer->next == e) {
       status = GRN_TOKENIZER_LAST;
     } else {
       status = GRN_TOKENIZER_CONTINUE;

  Added: test/command/suite/tokenizers/mecab/full_width_space/last.expected (+2 -0) 100644
===================================================================
--- /dev/null
+++ test/command/suite/tokenizers/mecab/full_width_space/last.expected    2015-02-28 20:10:45 +0900 (98f6661)
@@ -0,0 +1,2 @@
+tokenize TokenMecab '日本 '
+[[0,0.0,0.0],[{"value":"日本","position":0}]]

  Added: test/command/suite/tokenizers/mecab/full_width_space/last.test (+1 -0) 100644
===================================================================
--- /dev/null
+++ test/command/suite/tokenizers/mecab/full_width_space/last.test    2015-02-28 20:10:45 +0900 (145636b)
@@ -0,0 +1 @@
+tokenize TokenMecab '日本 '
-------------- next part --------------
HTML����������������������������...
Download 



More information about the Groonga-commit mailing list
Back to archive index