Interacting with Elasticsearch
The primary way of interacting with Elasticsearch is via REST API. Elasticsearch provides JSON-based REST API over HTTP. By default, Elasticsearch REST API runs on port 9200. Anything from creating an index to shutting down a node is a simple REST call. The APIs are broadly classified into the following:
- Document APIs: CRUD (Create Retrieve Update Delete) operations on documents
- Search APIs: For all the search operations
- Indices APIs: For managing indices (creating an index, deleting an index, and so on)
- Cat APIs: Instead of JSON, the data is returned in tabular form
- Cluster APIs: For managing the cluster
We have a chapter dedicated to each one of them to discuss more in detail. For example, indexing documents in Chapter 4, Indexing and Updating Your Data and search in Chapter 6, All About Search and so on. In this section, we will go through some basic CRUD using the Document APIs. This section is simply a brief introduction on how to manipulate data using Document APIs. To use Elasticsearch in your application, clients in all major languages, such as Java, Python, are also provided. The majority of the clients acts as a wrapper around the REST API.
To better explain the CRUD operations, imagine we are building an e-commerce site. And we want to use Elasticsearch to power its search functionality. We will use an index named chapter1
and store all the products in the type called product
. Each product we want to index is represented by a JSON document. We will start by creating a new product document, and then we will retrieve a product by its identifier, followed by updating a product's category and deleting a product using its identifier.
Creating a document
A new document can be added using the Document API's. For the e-commerce example, to add a new product, we execute the following command. The body of the request is the product document we want to index.
PUT http://localhost:9200/chapter1/product/1 { "title": "Learning Elasticsearch", "author": "Abhishek Andhavarapu", "category": "books" }
Let's inspect the request:
INDEX | chapter1 |
TYPE | product |
IDENTIFIER | 1 |
DOCUMENT | JSON |
HTTP METHOD | PUT |
The document's properties, such as title, author, the category, are also known as fields
, which are similar to SQL columns.
Note
Elasticsearch will automatically create the index chapter1
and type product
if they don't exist already. It will create the index with the default settings.
When we execute the preceding request, Elasticsearch responds with a JSON response, shown as follows:
{ "_index": "chapter1", "_type": "product", "_id": "1", "_version": 1, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "created": true }
In the response, you can see that Elasticsearch created the document and the version of the document is 1. Since you are creating the document using the HTTP PUT
method, you are required to specify the document identifier. If you don't specify the identifier, Elasticsearch will respond with the following error message:
No handler found for uri [/chapter1/product/] and method [PUT]
If you don't have a unique identifier, you can let Elasticsearch assign an identifier for you, but you should use the POST
HTTP method. For example, if you are indexing log messages, you will not have a unique identifier for each log message, and you can let Elasticsearch assign the identifier for you.
Note
In general, we use the HTTP POST
method for creating an object. The HTTP PUT
method can also be used for object creation, where the client provides the unique identifier instead of the server assigning the identifier.
We can index a document without specifying a unique identifier as shown here:
POST http://localhost:9200/chapter1/product/ { "title": "Learning Elasticsearch", "author": "Abhishek Andhavarapu", "category": "books" }
In the above request, URL doesn't contain the unique identifier and we are using the HTTP POST method. Let's inspect the request:
INDEX | chapter1 |
TYPE | product |
DOCUMENT | JSON |
HTTP METHOD | POST |
The response from Elasticsearch is shown as follows:
{ "_index": "chapter1", "_type": "product", "_id": "AVmKvtPwWuEuqke_aRsm", "_version": 1, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "created": true }
You can see from the response that Elasticsearch assigned the unique identifier AVmKvtPwWuEuqke_aRsm
to the document and created flag is set to true. If a document with the same unique identifier already exists, Elasticsearch replaces the existing document and increments the document version. If you have to run the same PUT
request from the beginning of the section, the response from Elasticsearch would be this:
{ "_index": "chapter1", "_type": "product", "_id": "1", "_version": 2, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "created": false }
In the response, you can see that the created flag is false
since the document with id: 1
already exists. Also, observe that the version is now 2
.
Retrieving an existing document
To retrieve an existing document, we need the index, type and a unique identifier of the document. Let's try to retrieve the document we just indexed. To retrieve a document we need to use HTTP GET
method as shown below:
GET http://localhost:9200/chapter1/product/1
Let's inspect the request:
INDEX | chapter1 |
TYPE | product |
IDENTIFIER | 1 |
HTTP METHOD | GET |
Response from Elasticsearch as shown below contains the product document we indexed in the previous section:
{ "_index": "chapter1", "_type": "product", "_id": "1", "_version": 2, "found": true, "_source": { "title": "Learning Elasticsearch", "author": "Abhishek Andhavarapu", "category": "books" } }
The actual JSON document will be stored in the _source
field. Also note the version in the response; every time the document is updated, the version is increased.
Updating an existing document
Updating a document in Elasticsearch is more complicated than in a traditional SQL database. Internally, Elasticsearch retrieves the old document, applies the changes, and re-inserts the document as a new document. The update operation is very expensive. There are different ways of updating a document. We will talk about updating a partial document here and in more detail in the Updating your data section in Chapter 4, Indexing and Updating Your Data.
Updating a partial document
We already indexed the document with the unique identifier 1
, and now we need to update the category of the product from just books
to technical books
. We can update the document as shown here:
POST http://localhost:9200/chapter1/product/1/_update { "doc": { "category": "technical books" } }
The body of the request is the field of the document we want to update and the unique identifier is passed in the URL.
Note
Please note the _update
endpoint at the end of the URL.
The response from Elasticsearch is shown here:
{
"_index": "chapter1",
"_type": "product",
"_id": "1",
"_version": 3,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
}
}
As you can see in the response, the operation is successful, and the version of the document is now 3
. More complicated update operations are possible using scripts and upserts.
Deleting an existing document
For creating and retrieving a document, we used the POST
and GET
methods. For deleting an existing document, we need to use the HTTP DELETE
method and pass the unique identifier of the document in the URL as shown here:
DELETE http://localhost:9200/chapter1/product/1
Let's inspect the request:
INDEX | chapter1 |
TYPE | product |
IDENTIFIER | 1 |
HTTP METHOD | DELETE |
The response from Elasticsearch is shown here:
{ "found": true, "_index": "chapter1", "_type": "product", "_id": "1", "_version": 4, "_shards": { "total": 1, "successful": 1, "failed": 0 } }
In the response, you can see that Elasticsearch was able to find the document with the unique identifier 1
and was successful in deleting the document.