ATG Tech Blog: ElasticSearch - NoSQL Search Engine

In the world of large data, it is important that we provide the user/customer a simple tool to find the right pieces of information quickly. The search engine is exactly suitable for the requirement. When it comes to the e-commerce world, it is very important as the user has lots of option to buy and Retail Company have a lot of different verity of product to offer for the customer.

We have a lot of search engine in the market, some are commercial and licensed one like oracle Endeca, and some are open sourced search engine like Solr and ElasticSearch. Both the Solr and ElasticSearch engine offer lot of feature and similar features. Both uses the Apaches Lucene as core component as indexing the data.

What is Elastic Search?

It is an open source (but operated by a single company) search engine. It is built thinking the cloud-based search engine in mind, basically, the indexed data is present in a different node and also have replica node for the failure of any node. It also has inbuilt node synchronization. When a new node is added to cluster the node is brought up-to-date and the system is balanced & the node is allowed to serve the query. It provides easy horizontal scaling ability. It is complete rest-API based search engine and has very high indexing throughput due to which it adds up for different use cases.

Key Point to Know

· ES basically data is called "document", we can consider each row when compared to RBDMS.

· ES is the schema-less search engine.

· ES document can be structure-less, is not mandatory to follow a given structure in document.

· Each data is called field, it is the column in a row when compared to DB.

· We can still describe the document structure in the mapping file.

· We still have index and type which can reduce the scope of data change and search, which in turn improve the performance.

· All the action are performed on REST API, including updating settings.

· All the API call will take data as in JSON format.

· ES support the nest document, we can see in examples.

Best use case of ElasticSearch

Elastic Search is used in different technology stacks, as it is widely popular for text-based searching, it is used a search engine for different log analysis tools such as ELK (ElasticSearch, Logstash, Kibana) used for data analysis tools to get trends and reports.

As it has best writing thought put, it is also used as No-SQL data search, where back end it is supported by the no-SQL databases such as Cassandra, it will fill the gap of search the data ability as it is good in very high indexing throughput, the data changes are fastly and easily consumed. Elasticsearch engine has plugin which is used to synchronize between.

It is used in e-commerce world as a search engine as it supports the facet (aggregations), auto complete, fuzzy search, It is not the popular once such as solr and endeca. But slowly we see few of the retail site are powered by the ElasticSearch.

Getting Started with ES

Get before getting started, let install java jre (as ElasticSearch in developed in java), fiddler (any tool to create rest request)/POST plugin in the browser.

Download the Elastic search from the site ElasticSearch site
Zip the archive, We Can find some folder such as (Bin, Config, Data, Lib, logs, Modules, Plugin).

Config contains 'elasticsearch.yml' which provides configuration for nodes, backup.
Bin folder contains the bat files to start the search engine.

To start the search engine, go to bin folder and run the elasticsearch.bat file.
Default the port for search is 9200 (Can be changed in elasticsearch.yml).
Go to browser and access http://localhost:9200/
You should access the page which provides cluster information lucene_version.

So now the ES engine is up and run with the default setting. Next step will be loading data to search from.

Data Indexing

As we know the ES Engine is API based engine, we have 2 kinds of API for data upload.

· Single document create, update and delete.

· Bulk create, update and delete.

Single document

ES have API which are perform action on single document, it can be used when we have to operate on one document at a time.

Curl XPOST http://localhost:9200/<index>/<type>/1 -d {

“Id”:”1”,

“Name”: “Pradeep”

“Address”: {

“Street”: “sapient office”,

“City”: “Bangalore”,

“Zip code”: “560098”,

“Country”: “India”

“Location”: [34.05, -118.98],

“Rating”: “4.5”

}

Here we can see that the data can be sapientOffice and type as employee. Using the above curl we can create a new record or update existing document at id=1, both the operation will use the POST method.

The address is the one of the example of nest document, which is supported by ES.

Curl XDELETE http://localhost:9200/<index>/<type>/1 , will delete the document from the ES.

Bulk Document

ES also provide API for bulk upload of the data for indexing. Below is the syntax of the API for bulk upload of data.

Curl XPOST http://localhost:9200/<index>/<type>/_bulk -d {

{“index”:{}}

{“Name”: “Pradeep”, “Address”: {“Street”: “sapient office”,“City”: “Bangalore”,“Zip code”: “560098”,“Country”: “India”},“Location”: [34.05, -118.98],“Rating”: “4.5”}

{“index”:{}}

}

Here we can see that the data can be sapientOffice and type as employee. Using the above curl we can create a new record or update existing documents, both the operation will use the POST method.

Curl XDELETE http://localhost:9200/<index>/<type>/, will delete all the documents under the under the from the ES.

So now we know how to load the data in ES, let see how to get the data from ES.

ES Query

One of the key functionality of the search engine is how fast we can retrieve the data and how relevant the data is. ES provide different set of syntax of query for fetching the data and which can be modified to suit our requirement.

Again the query to fetch the data is over API calls and request and response is in the JSON format. ES provides a rich, flexible, query language called the query DSL (domain-specific language), which allows us to build much more complicated, robust queries.

All the search related query are under the “_search” API domain.

Let see different kinds of queries.

1. Below query will provide all the document under the all type and all index.

Curl XGet http://localhost:9200/<index>/<type>/_search -d{

“query”:{

“match_all”:{}

}

2. Below query will provide all the document under the all type of index .

Curl XGet http://localhost:9200/<index>/<type>/_search -d{

“query”:{

“match_all”:{}

}

3. Below query will provide all the document under the type of index .

Curl XGet http://localhost:9200/<index>/<type>/_search -d{

“query”:{

“match_all”:{}

}

4. Below query will provide all the document under the type of index for search term “Pradeep” anywhere (Any field) in document.

Curl XGet http://localhost:9200/<index>/<type>/_search -d{

“query”:{

“query_string”:{

“query”:”Pradeep”

}

5. Below query will provide all the document under the type of index for search term “Pradeep” in field Name or address’s street field in document.

Curl XGet http://localhost:9200/<index>/<type>/_search -d{

“query”:{

“query_string”:{

“query”:”Pradeep”,

“fields”:[“Name”,”address.street”]

}

Using Filter (Provide boundary for search)

6. Below query will provide all the document under the type of index for search term “Pradeep” in field Name or address’s street field in document and also has the rating in range off.

Curl XGet http://localhost:9200/<index>/<type>/_search -d{

“query”:{

“Filtered”:{

“filter”:{

“range”:{

“rating”:{

“gte”:4.0

}

},“query_string”:{

“query”:”Pradeep”,

“fields”:[“Name”,”address.street”]

}

}//query ends

}//Filtered ends

}

7. Below query will provide all the document under the type of index has the rating in range off.

Curl XGet http://localhost:9200/<index>/<type>/_search -d{

“query”:{

“Filtered”:{

“filter”:{

“range”:{

“rating”:{

“gte”:4.0

}

}//Filtered ends

}

ATG Tech Blog

Monday, December 26, 2016

ElasticSearch - NoSQL Search Engine

2 comments: