[Groonga-commit] groonga/groonga [master] doc: describe query_expansion

Back to archive index

null+****@clear***** null+****@clear*****
2012年 5月 20日 (日) 20:08:57 JST


Kouhei Sutou	2012-05-20 20:08:57 +0900 (Sun, 20 May 2012)

  New Revision: 764c1a41969fb2da74319ef3523be96215afbd4e

  Log:
    doc: describe query_expansion

  Added files:
    doc/source/example/commands/select/query_expansion_complex.log
    doc/source/example/commands/select/query_expansion_substitute.log
    doc/source/example/commands/select/query_expansion_substitution_table.log
  Modified files:
    doc/source/commands/select.txt

  Modified: doc/source/commands/select.txt (+94 -6)
===================================================================
--- doc/source/commands/select.txt    2012-05-20 09:03:18 +0900 (4a6eebe)
+++ doc/source/commands/select.txt    2012-05-20 20:08:57 +0900 (785cc05)
@@ -67,7 +67,13 @@ Here are a schema definition and sample data to show usage.
 ..  "n_likes": 10},
 .. {"_key":    "Mroonga",
 ..  "content": "I also started to use mroonga. It's also very fast! Really fast!",
-..  "n_likes": 15}
+..  "n_likes": 15},
+.. {"_key":    "Good-bye Senna",
+..  "content": "I migrated all Senna system!",
+..  "n_likes": 3},
+.. {"_key":    "Good-bye Tritonn",
+..  "content": "I also migrated all Tritonn system!",
+..  "n_likes": 6}
 .. ]
 
 There is a table, ``Entries``, for blog entries. An entry has title,
@@ -435,14 +441,96 @@ more searches aren't executed. And no records are matched.
 ``query_expansion``
 """""""""""""""""""
 
-TODO: write in English and add example.
+It's for query expansion. Query expansion substitutes specific words
+to another words in query. Nomally, it's used for synonym search.
+
+It specifies a column that is used to substitute ``query`` parameter
+value. The format of this parameter value is
+"``${TABLE}.${COLUMN}``". For example, "``Terms.synonym``" specifies
+``synonym`` column in ``Terms`` table.
+
+Table for query expansion is called "substitution table". Substitution
+table's key must be ``ShortText``. So array table (``TABLE_NO_KEY``)
+can't be used for query expansion. Because array table doesn't have
+key.
+
+Column for query expansion is called "substitution
+column". Substitution column's value type must be
+``ShortText``. Column type must be vector (``COLUMN_VECTOR``). Key of
+substitution table in query is substituted with values in substitution
+column.
+
+If a word in ``query`` is a key of substitution table, the word is
+substituted with substitution column value that is associated with the
+key. Substition isn't performed recursively. It means that
+substitution target words and pharses in substituted query aren't
+substituted.
+
+Here is a sample substitution table to show a simple
+``query_expansion`` with subtitution column usage example.
+
+.. groonga-command
+.. include:: ../example/commands/select/query_expansion_substitution_table.log
+.. table_create Thesaurus TABLE_PAT_KEY|KEY_NORMALIZE ShortText
+.. column_create Thesaurus synonym COLUMN_VECTOR ShortText
+.. load --table Thesaurus
+.. [
+.. {"_key": "mroonga", "synonym": ["mroonga", "tritonn", "groonga mysql"]},
+.. {"_key": "groonga", "synonym": ["groonga", "senna"]}
+.. ]
 
-It specifies a column that is used to expand (substitute) ``query``
-parameter value.
+``Thesaurus`` substitution table has two synonyms, ``"mroonga"`` and
+``"groonga"``. If an user searches with ``"mroonga"``, groonga
+searches with ``"((mroonga) OR (tritonn) OR (groonga mysql))"``. If an
+user searches with ``"groonga"``, groonga searchs with ``"((groonga)
+OR (senna))"``. Nomrally, it's good idea that substitution table has
+``KEY_NORMALIZE`` flag. If the flag is used, substitute target word is
+matched in case insensitive manner.
+
+Note that those synonym values include the key value such as
+``"mroonga"`` and ``"groonga"``. It's recommended that you include the
+key value. If you don't include key value, substituted value doesn't
+include the original substuted value. Normally, including the original
+value is better search result. If you have a word that you don't want
+to be searched, you should not include the original word. For example,
+you can implement "stop words" by an empty vector value.
+
+Here is a simple ``query_expansion`` with substitution column
+usage example.
+
+.. groonga-command
+.. include:: ../example/commands/select/query_expansion_substitute.log
+.. select Entries --match_columns content --query "mroonga"
+.. select Entries --match_columns content --query "mroonga" --query_expansion Thesaurus.synonym
+.. select Entries --match_columns content --query "((mroonga) OR (tritonn) OR (groonga mysql))"
+
+The first ``select`` command doesn't use query expansion. So a record
+that has ``"tritonn"`` isn't found. The second ``select`` command uses
+query expansion. So a record that has ``"tritonn"`` is found. The
+third ``select`` command doesn't use query expansion but it is same as
+the second ``select`` command. The third one uses expanded query.
+
+Each substitute value can contain any :doc:`/spec/query_syntax` syntax
+such as ``(...)`` and ``OR``. You can use complex substitution by
+using those syntax.
+
+Here is a complex substitution usage example that uses query syntax.
+
+.. groonga-command
+.. include:: ../example/commands/select/query_expansion_complex.log
+.. load --table Thesaurus
+.. [
+.. {"_key": "popular", "synonym": ["popular", "n_likes:>=10"]}
+.. ]
+.. select Entries --match_columns content --query "popular" --query_expansion Thesaurus.synonym
 
-query_expansionパラメータには、queryパラメータに指定された文字列を置換(拡張)する条件となるテーブル・カラムを指定します。フォーマットは「${テーブル名}.${カラム名}」となります。指定するテーブルは文字列を主キーとするハッシュ型あるいはパトリシア木型のテーブルで、一つ以上の文字列型のカラムが定義されている必要があります。(ここでは置換テーブルと呼びます。)
+The ``load`` command register a new synonym ``"popular"``. It is
+substituted with ``((popular) OR (n_likes:>=10))``. The substituted
+query means that "popular" is containing the word "popular" or 10 or
+more liked entries.
 
-queryパラメータに指定された文字列が、指定されたテーブルの主キーと完全一致する場合、その文字列を指定されたカラム値の文字列に置換します。queryパラメータが、空白、括弧、演算子などを含む場合は、その演算子によって区切られた文字列の単位で置換が実行されます。ダブルクォート("")で括られた範囲は、その内部に空白を含んでいても一つの置換される単位と見なされます。検索文字列と置換テーブルの主キー値との比較に際して大文字小文字等を区別したくない場合には、置換テーブルを定義する際にKEY_NORMALIZEを指定します。置換後の文字列となるカラムの値には、括弧や*, ORなど、queryパラメータで利用可能な全ての演算子を指定することができます。
+The ``select`` command outputs records that ``n_likes`` column value
+is equal to or more than ``10`` from ``Entries`` table.
 
 Output related parameters
 ^^^^^^^^^^^^^^^^^^^^^^^^^

  Added: doc/source/example/commands/select/query_expansion_complex.log (+52 -0) 100644
===================================================================
--- /dev/null
+++ doc/source/example/commands/select/query_expansion_complex.log    2012-05-20 20:08:57 +0900 (09ab6e4)
@@ -0,0 +1,52 @@
+Execution example::
+
+  load --table Thesaurus
+  [
+  {"_key": "popular", "synonym": ["popular", "n_likes:>=10"]}
+  ]
+  # [[0,1337512074.0285,0.200526237487793],1]
+  select Entries --match_columns content --query "popular" --query_expansion Thesaurus.synonym
+  # [
+  #   [
+  #     0, 
+  #     1337512074.4299, 
+  #     0.000798463821411133
+  #   ], 
+  #   [
+  #     [
+  #       [
+  #         2
+  #       ], 
+  #       [
+  #         [
+  #           "_id", 
+  #           "UInt32"
+  #         ], 
+  #         [
+  #           "_key", 
+  #           "ShortText"
+  #         ], 
+  #         [
+  #           "content", 
+  #           "Text"
+  #         ], 
+  #         [
+  #           "n_likes", 
+  #           "UInt32"
+  #         ]
+  #       ], 
+  #       [
+  #         2, 
+  #         "Groonga", 
+  #         "I started to use groonga. It's very fast!", 
+  #         10
+  #       ], 
+  #       [
+  #         3, 
+  #         "Mroonga", 
+  #         "I also started to use mroonga. It's also very fast! Really fast!", 
+  #         15
+  #       ]
+  #     ]
+  #   ]
+  # ]

  Added: doc/source/example/commands/select/query_expansion_substitute.log (+131 -0) 100644
===================================================================
--- /dev/null
+++ doc/source/example/commands/select/query_expansion_substitute.log    2012-05-20 20:08:57 +0900 (e26ffbc)
@@ -0,0 +1,131 @@
+Execution example::
+
+  select Entries --match_columns content --query "mroonga"
+  # [
+  #   [
+  #     0, 
+  #     1337512073.4185, 
+  #     0.000695228576660156
+  #   ], 
+  #   [
+  #     [
+  #       [
+  #         1
+  #       ], 
+  #       [
+  #         [
+  #           "_id", 
+  #           "UInt32"
+  #         ], 
+  #         [
+  #           "_key", 
+  #           "ShortText"
+  #         ], 
+  #         [
+  #           "content", 
+  #           "Text"
+  #         ], 
+  #         [
+  #           "n_likes", 
+  #           "UInt32"
+  #         ]
+  #       ], 
+  #       [
+  #         3, 
+  #         "Mroonga", 
+  #         "I also started to use mroonga. It's also very fast! Really fast!", 
+  #         15
+  #       ]
+  #     ]
+  #   ]
+  # ]
+  select Entries --match_columns content --query "mroonga" --query_expansion Thesaurus.synonym
+  # [
+  #   [
+  #     0, 
+  #     1337512073.6214, 
+  #     0.000687360763549805
+  #   ], 
+  #   [
+  #     [
+  #       [
+  #         2
+  #       ], 
+  #       [
+  #         [
+  #           "_id", 
+  #           "UInt32"
+  #         ], 
+  #         [
+  #           "_key", 
+  #           "ShortText"
+  #         ], 
+  #         [
+  #           "content", 
+  #           "Text"
+  #         ], 
+  #         [
+  #           "n_likes", 
+  #           "UInt32"
+  #         ]
+  #       ], 
+  #       [
+  #         3, 
+  #         "Mroonga", 
+  #         "I also started to use mroonga. It's also very fast! Really fast!", 
+  #         15
+  #       ], 
+  #       [
+  #         5, 
+  #         "Good-bye Tritonn", 
+  #         "I also migrated all Tritonn system!", 
+  #         6
+  #       ]
+  #     ]
+  #   ]
+  # ]
+  select Entries --match_columns content --query "((mroonga) OR (tritonn) OR (groonga mysql))"
+  # [
+  #   [
+  #     0, 
+  #     1337512073.82467, 
+  #     0.000659942626953125
+  #   ], 
+  #   [
+  #     [
+  #       [
+  #         2
+  #       ], 
+  #       [
+  #         [
+  #           "_id", 
+  #           "UInt32"
+  #         ], 
+  #         [
+  #           "_key", 
+  #           "ShortText"
+  #         ], 
+  #         [
+  #           "content", 
+  #           "Text"
+  #         ], 
+  #         [
+  #           "n_likes", 
+  #           "UInt32"
+  #         ]
+  #       ], 
+  #       [
+  #         3, 
+  #         "Mroonga", 
+  #         "I also started to use mroonga. It's also very fast! Really fast!", 
+  #         15
+  #       ], 
+  #       [
+  #         5, 
+  #         "Good-bye Tritonn", 
+  #         "I also migrated all Tritonn system!", 
+  #         6
+  #       ]
+  #     ]
+  #   ]
+  # ]

  Added: doc/source/example/commands/select/query_expansion_substitution_table.log (+12 -0) 100644
===================================================================
--- /dev/null
+++ doc/source/example/commands/select/query_expansion_substitution_table.log    2012-05-20 20:08:57 +0900 (28cac82)
@@ -0,0 +1,12 @@
+Execution example::
+
+  table_create Thesaurus TABLE_PAT_KEY|KEY_NORMALIZE ShortText
+  # [[0,1337512072.6144,0.000249862670898438],true]
+  column_create Thesaurus synonym COLUMN_VECTOR ShortText
+  # [[0,1337512072.81539,0.000774383544921875],true]
+  load --table Thesaurus
+  [
+  {"_key": "mroonga", "synonym": ["mroonga", "tritonn", "groonga mysql"]},
+  {"_key": "groonga", "synonym": ["groonga", "senna"]}
+  ]
+  # [[0,1337512073.01694,0.200634479522705],2]




Groonga-commit メーリングリストの案内
Back to archive index