HAYASHI Kentaro
null+****@clear*****
Fri Jun 28 14:51:34 JST 2013
HAYASHI Kentaro 2013-06-28 14:51:34 +0900 (Fri, 28 Jun 2013) New Revision: 685f5a274b33698bdd6a60b898c0d1a55a2f1261 https://github.com/groonga/groonga.org/commit/685f5a274b33698bdd6a60b898c0d1a55a2f1261 Message: blog en: add 3.0.5 release entry Added files: en/_posts/2013-06-29-release.textile Added: en/_posts/2013-06-29-release.textile (+140 -0) 100644 =================================================================== --- /dev/null +++ en/_posts/2013-06-29-release.textile 2013-06-28 14:51:34 +0900 (e35ebb8) @@ -0,0 +1,140 @@ +--- +layout: post.en +title: Groonga 3.0.5 has been released +description: Groonga 3.0.5 has been released! +published: false +--- + + +h2. Groonga 3.0.5 has been released + +"Groonga 3.0.5":/docs/news.html#release-3-0-5 has been released! + +How to install: "Install":/docs/install.html + +There are two topics for this release. + +* Supported single quoted string literal in output_columns +* Supported "html_untag":/docs/reference/functions/html_untag.html function experimentally + +h3. Supported single quoted string literal in output_columns + +In this release, we began to support single quoted string literal in output_columns. + +Since groonga 3.0.2 release, complex string concatination in @--output_columns@ had been supported. +This feature support following expression: + +<pre> +'"<" + title + ">"' +</pre> + +Note that 'title' means 'title' column in this case. Above query returns @"<(CONTENT OF TITLE)>"@. + +But there is the fact that single quote isn't supported in string literal at that time. + +Here is the sample schema: + +<pre> +table_create Entries TABLE_NO_KEY +column_create Entries title COLUMN_SCALAR ShortText + +load --table Entries +[ + {"title": "Single quote and double quote"} +] +</pre> + +In the previous release, there are some way to get @"<(CONTENT OF TITLE)>"@. + +* @select Entries --output_columns '_id, "<" + title + ">"' --command_version 2@ +* @select Entries --output_columns "_id, \"<\" + title + \">\"" --command_version 2@ + +Here is the revised query using single quote in string literal for groonga 3.0.5: + +<pre> +select Entries --output_columns "_id, '<' + title + '>'" --command_version 2 +</pre> + +As single quote has been supported, groonga 3.0.5 returns intended result sets even though the query which groonga 3.0.4 returns empty result. + +Here is the sample queries which groonga 3.0.4 or earlier version returns empty set: + +<pre> +# <"(contents of title column)"> +select Entries --output_columns "_id, '<\"' + title + '\">'" --command_version 2 +#=> [1,"<\"Single quote and double quote\">"] + +# <'(contents of title column)'> +select Entries --output_columns "_id, '<\\'' + title + '\\'>'" --command_version 2 +#=> [1,"<'Single quote and double quote'>"] +</pre> + +h3. Supported html_untag function experimentally + +In this release, we began to support html_untag function which strips HTML tags experimentally. + +For example, consider the case that scraped web site HTML is stored into groonga database. + +Here is the sample schema which stores scraped HTML: + +<pre> +table_create WebClips TABLE_NO_KEY +column_create WebClips url COLUMN_SCALAR ShortText +column_create WebClips content COLUMN_SCALAR ShortText +column_create WebClips tag COLUMN_VECTOR ShortText +</pre> + +Here is the sample data: + +<pre> +load --table WebClips +[ +{"url": "http://groonga.org", "tag": ["groonga"], "content": "groonga is <span class='emphasize'>fast</span>"}, +{"url": "http://mroonga.org", "tag": ["mroonga"], "content": "mroonga is <span class=\"emphasize\">fast</span>"}, +] +</pre> + +Specify column name as an argument of html_untag function. +According to above sample schema, if you want to get plain text of content column, use html_untag(content). + +Here is the sample query which returns plain text of content column: + +<pre> +select WebClips --output_columns "html_untag(content)" --command_version 2 +</pre> + +Here is the execution result of above query: + +<pre> +[[2], + [ + ["html_untag", "null"] + ], + ["groonga is fast"], + ["mroonga is fast"] +] +</pre> + +You can see that span tag with a class attribute is eliminated. + +Note that you need to specify with @--command_version 2@ if you use html_untag function. +Without this, you can't get intended search results. + +There is a reason why html_untag is supported. +It is a demand that we want to search scraped HTML contents which is stored into groonga database, then extract highlighted search results which does not contain extra noisy HTML tags. + +It is assumed to use with the snippet_html function (it isn't supported yet). + +Here is the concrete processing flow: + +<pre> +original HTML -(html_untag)-> plain text -(snippet_html)-> highlighted HTML +</pre> + +It isn't supported combination usage of html_untag and snippet_html yet, but it will be supported in the future release. + +h3. Conclusion + +See "Release 3.0.5 2013/06/29":/docs/news.html#release-3-0-5 about detailed changes since 3.0.4. + +Let's search by groonga! -------------- next part -------------- HTML����������������������������...Download