[Groonga-commit] groonga/groonga.org at 685f5a2 [gh-pages] blog en: add 3.0.5 release entry

Back to archive index

HAYASHI Kentaro null+****@clear*****
Fri Jun 28 14:51:34 JST 2013


HAYASHI Kentaro	2013-06-28 14:51:34 +0900 (Fri, 28 Jun 2013)

  New Revision: 685f5a274b33698bdd6a60b898c0d1a55a2f1261
  https://github.com/groonga/groonga.org/commit/685f5a274b33698bdd6a60b898c0d1a55a2f1261

  Message:
    blog en: add 3.0.5 release entry

  Added files:
    en/_posts/2013-06-29-release.textile

  Added: en/_posts/2013-06-29-release.textile (+140 -0) 100644
===================================================================
--- /dev/null
+++ en/_posts/2013-06-29-release.textile    2013-06-28 14:51:34 +0900 (e35ebb8)
@@ -0,0 +1,140 @@
+---
+layout: post.en
+title: Groonga 3.0.5 has been released
+description: Groonga 3.0.5 has been released!
+published: false
+---
+
+
+h2. Groonga 3.0.5 has been released
+
+"Groonga 3.0.5":/docs/news.html#release-3-0-5 has been released!
+
+How to install: "Install":/docs/install.html
+
+There are two topics for this release.
+
+* Supported single quoted string literal in output_columns
+* Supported "html_untag":/docs/reference/functions/html_untag.html function experimentally
+
+h3. Supported single quoted string literal in output_columns
+
+In this release, we began to support single quoted string literal in output_columns.
+
+Since groonga 3.0.2 release, complex string concatination in @--output_columns@ had been supported.
+This feature support following expression:
+
+<pre>
+'"<" + title + ">"'
+</pre>
+
+Note that 'title' means 'title' column in this case. Above query returns @"<(CONTENT OF TITLE)>"@.
+
+But there is the fact that single quote isn't supported in string literal at that time.
+
+Here is the sample schema:
+
+<pre>
+table_create Entries TABLE_NO_KEY
+column_create Entries title COLUMN_SCALAR ShortText
+
+load --table Entries
+[
+ {"title": "Single quote and double quote"}
+]
+</pre>
+ 
+In the previous release, there are some way to get @"<(CONTENT OF TITLE)>"@.
+
+* @select Entries --output_columns '_id, "<" + title + ">"' --command_version 2@
+* @select Entries --output_columns "_id, \"<\" + title + \">\"" --command_version 2@
+
+Here is the revised query using single quote in string literal for groonga 3.0.5:
+
+<pre>
+select Entries --output_columns "_id, '<' + title + '>'" --command_version 2
+</pre>
+
+As single quote has been supported, groonga 3.0.5 returns intended result sets even though the query which groonga 3.0.4 returns empty result.
+
+Here is the sample queries which groonga 3.0.4 or earlier version returns empty set:
+
+<pre>
+# <"(contents of title column)">
+select Entries --output_columns "_id, '<\"' + title + '\">'" --command_version 2
+#=> [1,"<\"Single quote and double quote\">"]
+
+# <'(contents of title column)'>
+select Entries --output_columns "_id, '<\\'' + title + '\\'>'" --command_version 2
+#=> [1,"<'Single quote and double quote'>"]
+</pre>
+
+h3. Supported html_untag function experimentally
+
+In this release, we began to support html_untag function which strips HTML tags experimentally.
+
+For example, consider the case that scraped web site HTML is stored into groonga database.
+
+Here is the sample schema which stores scraped HTML:
+
+<pre>
+table_create WebClips TABLE_NO_KEY
+column_create WebClips url COLUMN_SCALAR ShortText
+column_create WebClips content COLUMN_SCALAR ShortText
+column_create WebClips tag COLUMN_VECTOR ShortText
+</pre>
+    
+Here is the sample data:
+
+<pre>
+load --table WebClips
+[
+{"url": "http://groonga.org", "tag": ["groonga"], "content": "groonga is <span class='emphasize'>fast</span>"},
+{"url": "http://mroonga.org", "tag": ["mroonga"], "content": "mroonga is <span class=\"emphasize\">fast</span>"},
+]
+</pre>
+
+Specify column name as an argument of html_untag function.
+According to above sample schema, if you want to get plain text of content column, use html_untag(content).
+
+Here is the sample query which returns plain text of content column:
+
+<pre>
+select WebClips --output_columns "html_untag(content)" --command_version 2
+</pre>
+
+Here is the execution result of above query:
+
+<pre>
+[[2],
+  [
+    ["html_untag", "null"]
+  ],
+  ["groonga is fast"],
+  ["mroonga is fast"]
+]
+</pre>
+
+You can see that span tag with a class attribute is eliminated.
+
+Note that you need to specify with @--command_version 2@ if you use html_untag function.
+Without this, you can't get intended search results.
+
+There is a reason why html_untag is supported.
+It is a demand that we want to search scraped HTML contents which is stored into groonga database, then extract highlighted search results which does not contain extra noisy HTML tags.
+
+It is assumed to use with the snippet_html function (it isn't supported yet).
+
+Here is the concrete processing flow:
+
+<pre>
+original HTML -(html_untag)-> plain text -(snippet_html)-> highlighted HTML
+</pre>
+
+It isn't supported combination usage of html_untag and snippet_html yet, but it will be supported in the future release.
+
+h3. Conclusion
+
+See "Release 3.0.5 2013/06/29":/docs/news.html#release-3-0-5 about detailed changes since 3.0.4.
+
+Let's search by groonga!
-------------- next part --------------
HTML����������������������������...
Download 



More information about the Groonga-commit mailing list
Back to archive index