HorimotoYasuhiro
null+****@clear*****
Wed Oct 18 18:47:57 JST 2017
HorimotoYasuhiro 2017-10-18 18:47:57 +0900 (Wed, 18 Oct 2017) New Revision: 6d513c19c73cffe21c72777c5efae1ddaeee46f6 https://github.com/groonga/groonga.org/commit/6d513c19c73cffe21c72777c5efae1ddaeee46f6 Merged d9f6c33: Merge pull request #34 from komainu8/add_pgroonga_2.0.3_entry Message: blog en: add PGroong 2.0.3 entry Added files: en/_posts/2017-10-10-pgroonga-2.0.3.md Added: en/_posts/2017-10-10-pgroonga-2.0.3.md (+213 -0) 100644 =================================================================== --- /dev/null +++ en/_posts/2017-10-10-pgroonga-2.0.3.md 2017-10-18 18:47:57 +0900 (c857d4b4) @@ -0,0 +1,213 @@ +--- +layout: post.en +title: PGroonga (fast full text search module for PostgreSQL) 2.0.3 has been released +description: PGroonga (fast full text search module for PostgreSQL) 2.0.3 has been released! +--- + +## PGroonga (fast full text search module for PostgreSQL) 2.0.3 has been released + +[PGroonga](https://pgroonga.github.io/) 2.0.3 has been released! PGroonga makes PostgreSQL fast full text search for all languages.This is major version up! + +## About PGroonga + +I will explain about PGroonga. Because it is the first release announcement after the major version upgrade.The highkight is summarized after this. + +PGroonga is a PostgreSQL extension that makes PostgreSQL fast full text search platform for all languages.There are some PostgreSQL extensions that improves full text search feature of PostgreSQL.PGroonga provides rich full text search related features and is very fast. Because PGroonga uses [Groonga](http://groonga.org/) that is a full-fledged full text search engine as backend. + +### Speed + +PGroonga is faster than [pg\_bigm](http://pgbigm.osdn.jp/).PGroonga is faster than [textsearch](https://www.postgresql.jp/document/9.6/html/textsearch.html) contained in PostgreSQL.PGroonga is 10 times over faster about index creation and full text search. + +Here are benchmark results between PGroonga and pg\_bigm. They use Japanese Wikipedia data. + +Extension | Index creation time +----------|-------------------- +PGroonga | About 19m +pg\_bigm | About 33m + +In this case, PGroonga is about 2 times faster than pg\_bigm. + +Next is benchmark results between PGroonga and pg\_bigm.They use English Wikipedia data. Because textsearch isn't support Japanese. + +You can't comparison directly against the above result.Because the amount of data is different. + +Extension | Index creation time +-----------|-------------------- +PGroonga | About 1h 24m +textsearch | About 2h 53m + +In this case, PGroonga is about 2 times faster than textsearch. + +Here is a benchmark result for full text search between PGroonga and pg\_bigm: + +Search keywords | N hits | PGroonga | pg\_bigm +--------------------------------------|-------------|--------------|--------- +"PostgreSQL" or "MySQL" | About 300 | *About 2ms* | About 49ms +データベース(database in Japanese) | About 15000 | *About 49ms* | About 1300ms +テレビアニメ(TV animation in Japanese)| About 20000 | *About 65ms* | About 2800ms +日本(Japan in Japanese | About 530000| About 560ms | *About 480ms* + +In "日本" (Japan in Japanese) case, pg\_bigm is a bit faster(\*1) than PGroonga. But PGroonga is 24 times to 43 times faster than pg\_bigm in other cases. The result shows that PGroonga can perform stable high performance fast full text search against all keywords. + +(\*1) pg\_bigm can perform faster full text search against keywords that have 2 or less characters rather than keywords that have 3 or more characters. + +Here is a benchmark result for full text search between PGroonga and textsearch: + +Search keywords | N hits | PGroonga | textsearch | Groonga +------------------------|--------------|--------------|------------------|-------- +"PostgreSQL" or "MySQL" | About 1600 | About 6ms | *About 3ms* | About 3ms +database | About 210000 | About 698ms | *About 602ms* | About 19ms +animation | About 40000 | *About 173ms*| About 1000ms(\*2)| About 6ms +America | About 470000 | About 1300ms | *About 1200ms* | About 45ms + +(\*2) textsearch is slow because hit about 420 thousand items (about 10 times of PGroonga) with "animation".The cause is because "animation" is stemmed as "anim". + +The search times of PGroonga and textsearch are almost the same. Textsearch is slower in "animation" because it comes from the difference in the number of hits, not the essential search speed difference. + +For reference, the rsult of Groonga.Groonga is the full text search engine of PGroonga.As Groonga searches every few tens ms, you can see that the search speed of PGroonga and textsearch is not the speed of full-text search.So, The impact of common overhead in PostgreSQL is great. + +Details of benchmark is below reference. + + * [PGroonga vs pg\_bigm](https://pgroonga.github.io/reference/pgroonga-versus-pg-bigm.html) + + * [PGroonga vs textsearch vs pg\_trgm](https://pgroonga.github.io/reference/pgroonga-versus-textsearch-and-pg-trgm.html) + +PGroonga provides the following features that aren't provided by other extensions: + + * Normalize feature + * Custom tokenizer feature + * Custom tokenfilter feature + * Search using query language + * HTML highlight feature + * HTML snippet feature + * JSON search + * Autocomplate feature + * Synonyms search feature + * Query expand feature + +Normalize feature is a feature that converts different notation texts to unified notation text. + +For simple example, both "ポスグレ" (HALFWIDTH KATAKANA) and "ポスグレ" (FULLWIDTH KATAKANA) are converted to "ポスグレ" (FULLWIDTH KATAKANA). ("ポスグレ" is an abbreviation of PostgreSQL in Japanese.) +if you search by "ポスグレ"(FULLWIDTH KATAKANA), even texts written in "ポスグレ"(HALFWIDTH KATAKANA) will also be hit. + +For more complex example, "㍊"is converted to"ミリバール".This normalization is based on [Unicode NFKC](http://unicode.org/reports/tr15/) + +Custom tokenizer feature is a feature that customizes search keyword extraction process (tokenization). If you can custom tokenization, you can control trade-off between search precision and search performance. + +For example, if you use "tokenizer that is based of morphological analyzer", you can get better search precision and search performance but may not find some texts. + +FYI: There is no other extension that supports morphological analyzer based tokenizer. PGroonga is the only extension that supports it. + +Custom tokenfilter feature is a function that you can customize the process of processing keywords extracted with tokenizer. textsearch has the same function with the name dictionary. Both PGroonga and textsearch realize the stemming function with this mechanism. + +Search using query language is a function that you can specify AND/OR/NOT search.You can use it in the same grammar as Google. + +HTML highlight feature is a function to mark up search keywords with `<span class =" keyword "> ... </ span>`. The result is safe to use in HTML as it is.It is useful for developing web applications. + +HTML snippet feature is a function to return search keyword surrounding text, which is also used in Google search results. Of course, the keyword is marked up with `<span class =" keywor d "> ... </ span>`. It is also safe to use the result in HTML as it is. + +JSON search feature is a function that allows you to flexibly search by registering the entire column of `jsonb` type as an index. Since it is also possible to search all the texts in JSON in full text, it is easy to save the logs of various structures in JSON format and search later. + +Autocomplete feature is a function to supplement an input keyword in a text box for entering a search keyword. It is also implemented by Google. You can complement it by entering in Roman characters just like Google. + +Synonyms search feature is a function to search for texts whose contents are similar. It can be used to display similar articles on a blog. + +Query expand feature is a function that searches keywords that have the same meaning but different expre****@once***** example, When searching for "牛乳" it will also search for "牛乳" and "ミルク". + +About each these function see also [PHP manual high speed full text search system made with PostgreSQL and PGroonga](https://slide.rabbit-shocker.org/authors/kou/php-conference-2017/) + +Here are features that will be implemented in the feature. They are already implemented in Groonga. + + * Weight feature + +### Usage + +You can use PGroonga without full text search knowledge. You just create an index and puts a condition into `WHERE`: + +```sql +CREATE INDEX index_name ON table USING pgroonga (column); + +SELECT * FROM table WHERE column &@~ '全文検索'; +``` + +You can also use `LIKE` to use PGroonga. PGroonga provides a feature that performs `LIKE` with index. `LIKE` with PGroonga index is faster than `LIKE` without index. It means that you can improve performance without changing your application that uses the following SQL: +But `&@~` is faster than `LIKE`, I recommend you to fix it. + +```sql +SELECT * FROM table WHERE column LIKE '%全文検索%'; +``` + +Are you interested in PGroonga? Please [install](https://pgroonga.github.io/install/) and try [tutorial](https://pgroonga.github.io/tutorial/). You can know all PGroonga features. + +You can install PGroonga easily. Because PGroonga provides packages for major platforms. There are binaries for Windows. + +## Highlight + +Here are highlights form 1.2.3: + + * PostgreSQL 10 supported + + * Improve accuracy of query execution plan (speedup) + + * `pgroonga` schema is deprecated + + * You can also use `pgroonga` scheme. Because PGroonga 2.x is maintain compatibility with PGroonga 1.x. + +### PostgreSQL 10 supported + +PGroonga is supported PostgreSQL 10 by the other day released.You can use PGroonga in latest PostgreSQL! + +PGroonga is supported logical replication. logical replication is new feature for PostgreSQL 10.physical replication was able to be [replication](https://pgroonga.github.io/reference/replication.html), but with PostgreSQL 10 compatible, new choices have increased. + +Physical replication is disk usage is large, but crash recovery will become effective to some extent. + +On the other hand, you can constract more flexible schema by logical replication. For example, master is using only to update purpose and then slave is using only to read purpose.You can make PGroonga's index in only slaves.It makes you can scale out against read while improving to update throughput. + +We are planning to benchmark each replication method. + +About configuring PostgreSQL cluster If you need commercial support, please refer to [PGroonga Support Page](https://pgroonga.github.io/ja/support/). + +### Improve accuracy of query execution plan (speedup) + +Planner of PostgreSQL obtains information from each index including PGroonga, calculates the processing cost, and selects the optimal execution plan.So, If PGroonga return more accurate results, PostgreSQL can selects more high speed plan. + +In this release, a `STABLE` function (if the arguments are the same regardless of the contents of the DB, this function with the same result) or ` IMMUTABLE` function (if the arguments are the same within the same transaction, this function with the same result) Improved accuracy of execution plans when searching using results of these function. The representative of the `IMMUTABLE` function by PGroonga is [` pgroonga_query_expand () `](https://pgroonga.github.io/reference/functions/pgroonga-query-expand.html). This is a function to expand the query and use it for search queries as follows. This makes it easier to select a faster execution plan even when using functions that process search queries. + +```sql +SELECT * + FROM diaries + WHERE content &@~ pgroonga_query_expand('synonyms', 'term', 'synonyms', + 'Search query entered by the user'); +``` + +### `pgroonga` schema is deprecated + +PGroonga 1.x defined functions and operator classes under the `pgroonga` schema, but it was easier to use if you defined under the current schema (in most cases` public`) with prefixes.So, `pgroonga` schema is deprecated in PGroonga 2.x.It was defined with a name with `pgroonga _` prefix. + +`pgroonga` schema is deprecated. But you can also use `pgroonga` schema for maintain compatibility.Since at least PGroonga 2.x supports `pgroonga` schema, please gradually update to usage with` pgroonga _ `prefix. + +## How to upgrade + +This version is compatible with 1.0 or later.You can upgrade by steps in [ "Compatible case" in Upgrade document.](https://pgroonga.github.io/upgrade/) + +## Announce + +### Session + +On Friday, November 3, 2017[PostgreSQL Conference Japan 2017](https://www.postgresql.jp/events/jpug-pgcon2017) there is a session called [PGroonga 2.0 - The definitive edition of full-text search in PostgreSQL](https://www.postgresql.jp/events/jpug-pgcon2017#A4) + +On Friday, December 5, 2017[PGConf.ASIA 2017](http://www.pgconf.asia/JA/2017/) there is session called [PGroonga 2.0 – Make PostgreSQL rich full text search system backend!](http://www.pgconf.asia/JA/2017/day-1/#B5) + +Both session is about PGroonga 2.Session of PostgreSQL Conference Japan 2017 is for people who are not using PGroonga yet.Session of PGConf.ASIA 2017 is for people who already use PGroonga and developing extension functions. + +### Support servies + +We provide [Support of PGroonga](https://pgroonga.github.io/ja/support/).Please consult us when you need support related to PGroonga, such as consulting on index and search design method, investigation at trouble, technical support such as performance improvement, addition of new function etc. + +## Conclusion + +New PGroonga version has been released.PGroonga 2 become it is made to be able to use it more like PostgreSQL. + +See also [release note](https://pgroonga.github.io/news/#version-2-0-3) for all changes. + +Try PGroonga when you want to perform fast full text search against all languages on PostgreSQL! -------------- next part -------------- HTML����������������������������... URL: https://lists.osdn.me/mailman/archives/groonga-commit/attachments/20171018/ea4db9a7/attachment-0001.htm