The Core ElasticSearch Operations

Prior to this, as it is expressed in the previous article in which we aim to introduce ElasticSearch, ElasticSearch is RESTfull API driven. Almost every action with RESTfull API can be performed by using JSON through HTTP. For this reason, today we will perform the basic ElasticSearch operations by using curl(client url library) in a hypothetical example which indexes the articles that were published in kodcu.com with title, content, date of publishing, tag and author information.

 

Creating index

Create Index API provides to instantiate an index. ElasticSearch also supports the multiple indices and the execution of transactions between indices. Custom settings for each created index can also be provided.

hakdogan:~ hakdogan$ curl -XPUT 'http://localhost:9200/kodcucom/' -d '
> index:
> number_of_shards: 2
> number_of_replicas: 1
> '

With this command, we created an index by the name of kodcucom by specifying the number of shards and replicas.

A shard in ElasticSearch is a single lucene instance. It is managed automatically. An index has 5 primary shards as default. You can specify the default shard number in the config/elasticsearch.yml file. As we have seen in this example, this number can be changed particularly with respect to created index. The number of primary shards cannot be changed after creating an index.

ElasticSearch distributes shards between all nodes in the cluster and if there will occur a node failure or in case of adding a new node, it moves shards automatically.

As stated before, ElasticSearch is able to analyze the sent record and then create index and type information automatically with standard settings.

hakdogan:~ hakdogan$ curl -XPUT localhost:9200/kodcucom/article/1 -d '{
> title: "Java API for JSON Processing - Stream-based JSON Produce and Consume",
> content: "Java API for JSON Processing (JSON-P) standard under the umbrella of the Java EE 7 in the JSR-353 specification is an enterprise java technology.",
> postDate: "2013-08-06T12:00:00",
> tags: ["Java"],
> author: "Rahman Usta"
> }'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":1}

Thus, with the above command, an index with a name of kodcucom and a type with a name of article are being created with standard settings, and a record (JSON document) whose Id value is 1 is stored in ElasticSearch.

 

Getting document

ElasticSearch Get API allows you to get a document whose ID value is specified.

hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/1?pretty=true
{
  "_index" : "kodcucom",
  "_type" : "article",
  "_id" : "1",
  "_version" : 1,
  "exists" : true, "_source" : {
title: "Java API for JSON Processing - Stream-based JSON Produce and Consume",
content: "Java API for JSON Processing (JSON-P) standard under the umbrella of the Java EE 7 in the JSR-353 specification is an enterprise java technology.",
postDate: "2013-08-06T12:00:00",
tags: ["Java"],
author: "Rahman Usta"}
}

By default, Get API is real-time and it does not affect the refresh rate of index. You can specify the fields to be fetched while you are getting a document. It can be achieved to return a set of fields of the getting operation by passing parameter.

hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/1?fields=title,author
{
"_index":"kodcucom",
"_type":"article",
"_id":"1",
"_version":1,
"exists":true,
"fields": {
"author":"Rahman Usta",
"title":"Java API for JSON Processing - Stream-based JSON Produce and Consume"}
}

 

Getting multiple documents

Multi Get API allows you to get multiple documents based on the index, type (optional) and ID. The response includes a docs array with all the fetched documents.

hakdogan:~ hakdogan$ curl localhost:9200/_mget -d '{
> docs: [
>          {
>           _index: "kodcucom",
>           _type: "article",
>          _id: "1"
>         },
>        {
>        _index: "kodcucom",
>        _type: "article",
>        _id: "2"
>       }
>            ]
> }'
{
   "docs": [
                {"_index":"kodcucom",
                 "_type":"article",
                 "_id":"1",
                 "_version":1,
                 "exists":true,
                 "_source" : {
                 title: "Java API for JSON Processing - Stream-based JSON Produce and Consume",
                 content: "Java API for JSON Processing (JSON-P) standard under the umbrella of the Java EE 7 in the JSR-353 specification is an enterprise java technology.",
                 postDate: "2013-08-06T12:00:00",
                 tags: ["Java"],
                 author: "Rahman Usta"}
},
                {
                 "_index":"kodcucom",
                 "_type":"article",
                 "_id":"2",
                 "_version":1,
                 "exists":true,
                 "_source" : {
                 title: "Core ElasticSearch Operations",
                 content: "Elasticsearch is RESTful API driven",
                 postDate: "2013-08-13T09:00:00",
                 tags: ["elasticsearch, big-data"],
                 author: "Hüseyin Akdoğan"}
}
]
}

The Mget end point can also be used in conjunction with index and type information.

hakdogan:~ hakdogan$ curl localhost:9200/kodcucom/_mget -d '{
> docs: [
>          {
>           _type: "article",
>          _id: "1"
>         },
>        {
>        _type: "article",
>        _id: "2"
>       }
>        ]
> }'
hakdogan:~ hakdogan$ curl localhost:9200/kodcucom/article/_mget -d '{
> docs: [
>          {
>          _id: "1"
>         },
>        {
>        _id: "2"
>       }
>        ]
> }'

For a simple request, ids element can be used.

hakdogan:~ hakdogan$ curl localhost:9200/kodcucom/article/_mget -d '{
> ids: ["1", "2"]
> }'

Let’s see how to fetch specific field once for all.

hakdogan:~ hakdogan$ curl localhost:9200/_mget -d '{
> docs: [
>           {
>            _index: "kodcucom",
>            _type: "article",
>            _id: "1",
>            fields: ["title", "author"]
>          },
>         {
>           _index: "kodcucom",
>           _type: "article",
>           _id: "2",
>          fields: ["postDate", "tags"]
>         }
>        ]
> }'

 

Searching

Search API allows you to execute a search query and then get the results which match to this query. The search query can either be performed by using a simple query string as a parameter or by using a request body. Below, you can see an example of each use (the example which uses request body also contains range query).

hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/_search?fields=title,author
hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/_search -d '{
> query: {range: {postDate: {from: "2013-01-01", to: "2013-08-13"}}}
> }'

 

Updating

ElasticSearch Update API supports update by script update and passing a partial document which will merge into existing document. ElasticSearch uses versioning (Each indexed document in ElasticSearch is being versioned) in order to be sure about the update procedure. Updating means full reindex of the document.

hakdogan:~ hakdogan$ curl -XPOST localhost:9200/kodcucom/article/1/_update -d '{
> script: "ctx._source.tags += tag",
> params: {
> tag: "json-p"}
> }'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":2}

Above, you see a script update. ElasticSearch scripting module allows you to use scripts in order to evaluate custom expressions. Scripting module uses mvel language as default. With Lang plug-in, it is also permitted to run scripts in different languages (eg JavaScript, Groovy, Python).

Let’s return to the above command again, ctx in the command is a context of the script. With the script context, “tags” field is being updated over the _source field of the document. A script can also use parameter. Pay attention to the usage of params here. A value is set to tag parameter in “params” part.

You can add a new field to the document by script update.

hakdogan:~ hakdogan$ curl -XPOST localhost:9200/kodcucom/article/1/_update -d '{
> script: "ctx._source.temporaryField = \"temporary text\""
> }'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":3}

You can also delete, too.

hakdogan:~ hakdogan$ curl -XPOST localhost:9200/kodcucom/article/1/_update -d '{
> script: "ctx._source.remove(\"temporaryField\")"
> }'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":4}

Please pay attention to the version information in the produced output. Update API supports passing a partial document which will merge into existing document.

hakdogan:~ hakdogan$ curl -XPOST localhost:9200/kodcucom/article/1/_update -d '{
doc: {
author: "RAHMAN USTA"
}
}'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":5}

A point to be noted; if doc is specified with script then it is ignored. ElasticSearch also provides preloaded scripts support.

 

Deleting

Before examining the ElasticSearch Delete API, I want to specify that it is possible to delete a document (record) with script depending on the value that a field has.

hakdogan:~ hakdogan$ curl -XPOST localhost:9200/kodcucom/article/1/_update -d '{
script: "ctx._source.tags.contains(tag) ? ctx.op = \"delete\" : ctx.op = \"none\"",
params: { tag: "Java"}
}'
{"ok":true,"_index":"kodcucom","_type":"article","_id":"1","_version":6}hakdogan:~ hakdogan$
hakdogan:~ hakdogan$ curl -XGET localhost:9200/kodcucom/article/1?pretty=true
{
  "_index" : "kodcucom",
  "_type" : "article",
  "_id" : "1",
  "exists" : false
}

Delete API allows you to delete a document whose ID value is specified.

hakdogan:~ hakdogan$ curl -XDELETE localhost:9200/kodcucom/article/1
{"ok":true,"found":true,"_index":"kodcucom","_type":"article","_id":"1","_version":2}

Delete Index API allows you to delete an index.

hakdogan:~ hakdogan$ curl -XDELETE localhost:9200/kodcucom

Delete Index API can be applied to more than one index as default.

hakdogan:~ hakdogan$ curl -XDELETE localhost:9200

It must be set as action.disable_delete_all_indices true in order to disable the permission of deleting all indices.

No Comments

Post a Comment

Comment
Name
Email
Website