null+****@clear*****
null+****@clear*****
2012年 7月 24日 (火) 17:51:43 JST
Yoji SHIDARA 2012-07-24 17:51:43 +0900 (Tue, 24 Jul 2012) New Revision: 2eeeac9c39d772c7719b6bd789bfc03c6de6cf78 https://github.com/groonga/gcs.groonga.org/commit/2eeeac9c39d772c7719b6bd789bfc03c6de6cf78 Log: Working draft of aws_cloud_search article. #2 Added files: _posts/2012-07-26-work-with-aws-cloud-search-gem.md Added: _posts/2012-07-26-work-with-aws-cloud-search-gem.md (+220 -0) 100644 =================================================================== --- /dev/null +++ _posts/2012-07-26-work-with-aws-cloud-search-gem.md 2012-07-24 17:51:43 +0900 (ef24d4d) @@ -0,0 +1,220 @@ +--- +title: Using Groonga CloudSearch with aws_cloud_search gem +layout: post +published: false +--- +### Introduction + +This article describes how to use Groonga CloudSearch with [aws\_cloud\_search][]. + +Groonga CloudSearch is an Amazon CloudSearch compatible open source full text search server. +With Groonga CloudSearch, you can try Amazon CloudSearch APIs on your local machine. + +[aws\_cloud\_search][] is a Ruby library (gem) which wraps Amazon CloudSearch APIs. You can use aws\_cloud\_search to index your documents and search them. Though aws\_cloud\_search itself does not support Groonga CloudSearch, with a small hack (by monkey patching), we can direct its requests toward Groonga CloudSearch instead of Amazon CloudSearch. That means we can use aws\_cloud\_search gem with Groonga CloudSearch. + +### Prerequisites + +In this article, we assume that you + + * are working on some \*NIX environment, such as Mac OS X and Linux. + * have finished [the tutorial of Groonga CloudSearch][tutorial]. + * have [Ruby][ruby] setup and basic knowledge about [Ruby][ruby]. + +### Prepare Groonga CloudSearch and example documents + +First of all, let's try searching with Groonga CloudSearch and aws\_cloud\_search. In this section, we will use the documents prepared in the [tutorial][] to search for simplicity of explanation. You need to finish the [tutorial][] before you proceed. The way to index your own documents with aws\_cloud\_search will described in the later section. + +### Setup aws\_cloud\_search + +Install aws\_cloud\_search. We use [RubyGems][rubygems]. Run `gem install asw_cloud_search` on your terminal. + + $ gem install aws_cloud_search + Successfully installed aws_cloud_search-0.0.2 + 1 gem installed + Installing ri documentation for aws_cloud_search-0.0.2... + Installing RDoc documentation for aws_cloud_search-0.0.2... + + +### Prepare a script to direct the requests to Groonga CloudSearch + +As the URLs that aws\_cloud\_search to connect with are hard-coded, +we need a small patch to modify them. + +Save the following code as `dirct_to_local_gcs.rb`. + + # A small hack to use Groonga CloudSearch. + # We override these three methods to direct requests Groonga CloudSearch + # working on the localhost:7575. + # We use http://xip.io/, which provides wildcard DNS for any IP address. + module AWSCloudSearch + def self.search_url(domain, region="us-east-1") + "http://search-#{domain}.#{region}.127.0.0.1.xip.io:7575" + end + + def self.document_url(domain, region="us-east-1") + "http://doc-#{domain}.#{region}.127.0.0.1.xip.io:7575" + end + + def self.configuration_url + "https://cloudsearch.us-east-1.127.0.0.1.xip.io:7575" + end + end + +This code overrides aws\_cloud\_search to direct its requests to Groonga CloudSearch, which is running on `localhost:7575`. + +### Search the documents + +In order to illustrate how to make search requests with aws\_cloud\_search for Groonga CloudSearch, we create a small script to search the `example` domain on `localhost:7575`, which is created the [tutorial][]. + +Save the following code as `search.rb`. + + #!/usr/bin/env ruby + + require 'aws_cloud_search' + require './direct_to_local_gcs' # direct requests to localhost:7575 + + # Initiate a CloudSearch object corresponds to the example domain. + domain_name = 'example-00000000000000000000000000' + cloud_search = AWSCloudSearch::CloudSearch.new(domain_name) + + # Take a query string from the command line argument. + query = ARGV.join(' ') + + # Create a search request object for the query. + search_request = AWSCloudSearch::SearchRequest.new + search_request.q = query + + # Issue the request. + search_response = cloud_search.search(search_request) + + # Show the results. + puts "#{search_response.found} documents are found for the query '#{query}':" + + search_response.hits.each do |hit| + p hit + end + +You can execute the search with the script by `ruby search.rb [query]`. +Don't forget to start gcs server on `localhost:7575` beforehand (See details in the [tutorial][]). + +The output should be like the following: + + $ ruby search.rb tokyo + 3 documents are found for the query 'tokyo': + {"id"=>"id1", "data"=>{"_id"=>[1], "_key"=>["id1"], "address"=>["Shibuya, Tokyo, Japan"], "email_address"=>["info****@razil*****"], "name"=>["Brazil"]}} + {"id"=>"id3", "data"=>{"_id"=>[3], "_key"=>["id3"], "address"=>["Hongo, Tokyo, Japan"], "email_address"=>["info****@clear*****"], "name"=>["ClearCode Inc."]}} + {"id"=>"id9", "data"=>{"_id"=>[9], "_key"=>["id9"], "address"=>["Tokyo, Japan"], "email_address"=>[""], "name"=>["Umbrella Corporation"]}} + +It works. You can modify this script to fit on your needs. + +### Index your documents + +This section describes the way to index your documents by aws\_cloud\_search. +For explanation, we create a simple CUI tool to index an entry given from +command line arguments. + +Save the following code as `index.rb`. + + #!/usr/bin/env ruby + + require 'aws_cloud_search' + require './direct_to_local_gcs' # direct requests to localhost:7575 + + # Initiate a CloudSearch object corresponds to the example domain. + domain_name = 'example-00000000000000000000000000' + cloud_search = AWSCloudSearch::CloudSearch.new(domain_name) + + # Take the data from command line arguments. + id, name, address, email_address = ARGV + + # Create a document to be indexed. + document = AWSCloudSearch::Document.new + + document.id = id + document.add_field :name, name + document.add_field :address, address + document.add_field :email_address, email_address + + # Create a batch to index the document. + batch = AWSCloudSearch::DocumentBatch.new + batch.add_document document + + # Issue the request. + response = cloud_search.documents_batch(batch) + + # Show the response. + p response + + The script `index.rb` takes four arguments: `id`, `name`, `address` and `email_addess`. + Let us try to index a document. + + $ ruby index.rb id11 "Snowy Corporation" "Tokyo, Japan" snowy****@examp***** + {"status"=>"success", "adds"=>1, "deletes"=>0} + + The document is successfully indexed. + + Search by query `tokyo` again to check if the new document is searchable. + + $ ruby search.rb tokyo + 4 documents are found for the query 'tokyo': + {"id"=>"id1", "data"=>{"_id"=>[1], "_key"=>["id1"], "address"=>["Shibuya, Tokyo, Japan"], "email_address"=>["info****@razil*****"], "name"=>["Brazil"]}} + {"id"=>"id3", "data"=>{"_id"=>[3], "_key"=>["id3"], "address"=>["Hongo, Tokyo, Japan"], "email_address"=>["info****@clear*****"], "name"=>["ClearCode Inc."]}} + {"id"=>"id9", "data"=>{"_id"=>[9], "_key"=>["id9"], "address"=>["Tokyo, Japan"], "email_address"=>[""], "name"=>["Umbrella Corporation"]}} + {"id"=>"id11", "data"=>{"_id"=>[20], "_key"=>["id11"], "address"=>["Tokyo, Japan"], "email_address"=>["snowy****@examp*****"], "name"=>["Snowy Corporation"]}} + + The number of hit documents have increased to 4 (formerly it was 3), as it includes the new document. The last document is that we have indexed by `index.rb` script. + +### Remove the documents + +Removing document is done by the quite similar way to indexing. +Save the following code as `delete.rb`. + + #!/usr/bin/env ruby + + require 'aws_cloud_search' + require './direct_to_local_gcs' # direct requests to localhost:7575 + + # Initiate a CloudSearch object corresponds to the example domain. + domain_name = 'example-00000000000000000000000000' + cloud_search = AWSCloudSearch::CloudSearch.new(domain_name) + + # Take the document id to be deleted from the command line argument. + id = ARGV.shift + + # Create a document to be deleted. + document = AWSCloudSearch::Document.new + document.id = id + + # Create a batch to remove the document. + batch = AWSCloudSearch::DocumentBatch.new + batch.delete_document document + + # Issue the request. + response = cloud_search.documents_batch(batch) + + # Show the response + p response + +In order to delete the document with id = `id11` (the document added in the previous section), run `ruby index.rb id11`. + + $ ruby delete.rb id11 + {"status"=>"success", "adds"=>0, "deletes"=>1} + +The removed entry, `Snowy Corporation` is no longer appeared in the search results. + + $ ruby search.rb tokyo + 3 documents are found for the query 'tokyo': + {"id"=>"id1", "data"=>{"_id"=>[1], "_key"=>["id1"], "address"=>["Shibuya, Tokyo, Japan"], "email_address"=>["info****@razil*****"], "name"=>["Brazil"]}} + {"id"=>"id3", "data"=>{"_id"=>[3], "_key"=>["id3"], "address"=>["Hongo, Tokyo, Japan"], "email_address"=>["info****@clear*****"], "name"=>["ClearCode Inc."]}} + {"id"=>"id9", "data"=>{"_id"=>[9], "_key"=>["id9"], "address"=>["Tokyo, Japan"], "email_address"=>[""], "name"=>["Umbrella Corporation"]}} + +### NOTE Do we need a more realistic example like Sinatra app? + +### Summary + + TODO + + [aws\_cloud\_search]: https://github.com/spokesoftware/aws\_cloud\_search + [tutorial]: /docs/tutorial/ + [ruby]: http://www.ruby-lang.org/en/ + [rubygems]: http://rubygems.org/ -------------- next part -------------- HTML$B$NE:IU%U%!%$%k$rJ]4I$7$^$7$?(B...Download