[Groonga-commit] groonga/groonga [master] [doc][suggest] add about cooccurrence search for completion.

Back to archive index

null+****@clear***** null+****@clear*****
2011年 8月 10日 (水) 12:23:44 JST


Kouhei Sutou	2011-08-10 03:23:44 +0000 (Wed, 10 Aug 2011)

  New Revision: b2c124834d1aa1b34e8aa14282b780f8b6014ebf

  Log:
    [doc][suggest] add about cooccurrence search for completion.

  Modified files:
    doc/source/suggest/completion.txt

  Modified: doc/source/suggest/completion.txt (+91 -4)
===================================================================
--- doc/source/suggest/completion.txt    2011-08-10 00:24:43 +0000 (124dcc1)
+++ doc/source/suggest/completion.txt    2011-08-10 03:23:44 +0000 (db58c2e)
@@ -25,8 +25,95 @@ Prefix RK search
 ^^^^^^^^^^^^^^^^
 
 RK means Romaji and Katakana. Prefix RK search can find
-registered words by romaji, katakana or hiragana.
+registered words that start with user's input by romaji,
+katakana or hiragana. It's useful for searching in Japanese.
 
-For example, there is a registered word "日本". "ニホン" (it
-must be katakana) is registered as its reading. An user can
-find "日本" by "ni", "二" or "に".
+For example, there is a registered word "日本". And "ニホン"
+(it must be katakana) is registered as its reading. An user
+can find "日本" by "ni", "二" or "に".
+
+Cooccurrence search
+^^^^^^^^^^^^^^^^^^^
+
+Cooccurrence search can find registered words from user's
+partial input. It uses user input sequences that will be
+learned from query logs, access logs and so on.
+
+For example, there is the following user input sequence:
+
++----------+----------+
+|  input   |  submit  |
++----------+----------+
+|s         |no        |
++----------+----------+
+|se        |no        |
++----------+----------+
+|sea       |no        |
++----------+----------+
+|sear      |no        |
++----------+----------+
+|searc     |no        |
++----------+----------+
+|search    |yes       |
++----------+----------+
+|e         |no        |
++----------+----------+
+|en        |no        |
++----------+----------+
+|eng       |no        |
++----------+----------+
+|engi      |no        |
++----------+----------+
+|engin     |no        |
++----------+----------+
+|engine    |no        |
++----------+----------+
+|enginen   |no (typo!)|
++----------+----------+
+|engine    |yes       |
++----------+----------+
+
+Groonga creates the following completion pairs:
+
++----------+--------------------+
+|  input   |   completed word   |
++----------+--------------------+
+|s         |search              |
++----------+--------------------+
+|se        |search              |
++----------+--------------------+
+|sea       |search              |
++----------+--------------------+
+|sear      |search              |
++----------+--------------------+
+|searc     |search              |
++----------+--------------------+
+|e         |engine              |
++----------+--------------------+
+|en        |engine              |
++----------+--------------------+
+|eng       |engine              |
++----------+--------------------+
+|engi      |engine              |
++----------+--------------------+
+|engin     |engine              |
++----------+--------------------+
+|engine    |engine              |
++----------+--------------------+
+|enginen   |engine              |
++----------+--------------------+
+
+All user not-submitted inputs (e.g. "s", "se" and so on)
+before each an user submission maps to the submitted input
+(e.g. "search").
+
+To be precise, this description isn't correct because it
+omits about time stamp. Groonga doesn't case about "all user
+not-submitted inputs before each an user
+submission". Groonga just case about "all user not-submitted
+inputs within a minute from an user submission before each
+an user submission". Groonga doesn't treat user inputs
+before a minute ago.
+
+If an user inputs "sea" and cooccurrence search returns
+"search" because "sea".




Groonga-commit メーリングリストの案内
Back to archive index