Packt+ | Advance your knowledge in tech

You're reading from Learning Elasticsearch Structured and unstructured data using distributed real-time search and analytics

Product type Paperback

Published in Jun 2017

Publisher Packt

ISBN-13 9781787128453

Length 404 pages

Edition 1st Edition

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1):

Andhavarapu

View More author details

Table of Contents (17) Chapters

Title Page

Credits

About the Author

About the Reviewers

www.PacktPub.com

Customer Feedback

Preface

1. Introduction to Elasticsearch

2. Setting Up Elasticsearch and Kibana FREE CHAPTER

3. Modeling Your Data and Document Relations

4. Indexing and Updating Your Data

5. Organizing Your Data and Bulk Data Ingestion

6. All About Search

7. More Than a Search Engine (Geofilters, Autocomplete, and More)

8. How to Slice and Dice Your Data Using Aggregations

9. Production and Beyond

10. Exploring Elastic Stack (Elastic Cloud, Security, Graph, and Alerting)

Interacting with Elasticsearch

The primary way of interacting with Elasticsearch is via REST API. Elasticsearch provides JSON-based REST API over HTTP. By default, Elasticsearch REST API runs on port 9200. Anything from creating an index to shutting down a node is a simple REST call. The APIs are broadly classified into the following:

Document APIs: CRUD (Create Retrieve Update Delete) operations on documents
Search APIs: For all the search operations
Indices APIs: For managing indices (creating an index, deleting an index, and so on)
Cat APIs: Instead of JSON, the data is returned in tabular form
Cluster APIs: For managing the cluster

We have a chapter dedicated to each one of them to discuss more in detail. For example, indexing documents in Chapter 4, Indexing and Updating Your Data and search in Chapter 6, All About Search and so on. In this section, we will go through some basic CRUD using the Document APIs. This section is simply a brief introduction on how to manipulate data using Document APIs. To use Elasticsearch in your application, clients in all major languages, such as Java, Python, are also provided. The majority of the clients acts as a wrapper around the REST API.

To better explain the CRUD operations, imagine we are building an e-commerce site. And we want to use Elasticsearch to power its search functionality. We will use an index named chapter1 and store all the products in the type called product. Each product we want to index is represented by a JSON document. We will start by creating a new product document, and then we will retrieve a product by its identifier, followed by updating a product's category and deleting a product using its identifier.

Creating a document

A new document can be added using the Document API's. For the e-commerce example, to add a new product, we execute the following command. The body of the request is the product document we want to index.

PUT http://localhost:9200/chapter1/product/1
{
  "title": "Learning Elasticsearch",
  "author": "Abhishek Andhavarapu",
  "category": "books"
}

Let's inspect the request:

INDEX	chapter1
TYPE	product
IDENTIFIER	1
DOCUMENT	JSON
HTTP METHOD	PUT

The document's properties, such as title, author, the category, are also known as fields, which are similar to SQL columns.

Note

Elasticsearch will automatically create the index chapter1 and type product if they don't exist already. It will create the index with the default settings.

When we execute the preceding request, Elasticsearch responds with a JSON response, shown as follows:

{
   "_index": "chapter1",
   "_type": "product",
   "_id": "1",
"_version": 1,
   "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
   },
"created": true
 }

In the response, you can see that Elasticsearch created the document and the version of the document is 1. Since you are creating the document using the HTTP PUT method, you are required to specify the document identifier. If you don't specify the identifier, Elasticsearch will respond with the following error message:

No handler found for uri [/chapter1/product/] and method [PUT]

If you don't have a unique identifier, you can let Elasticsearch assign an identifier for you, but you should use the POST HTTP method. For example, if you are indexing log messages, you will not have a unique identifier for each log message, and you can let Elasticsearch assign the identifier for you.

Note

In general, we use the HTTP POST method for creating an object. The HTTP PUT method can also be used for object creation, where the client provides the unique identifier instead of the server assigning the identifier.

We can index a document without specifying a unique identifier as shown here:

POST http://localhost:9200/chapter1/product/
{
  "title": "Learning Elasticsearch",
  "author": "Abhishek Andhavarapu",
  "category": "books"
}

In the above request, URL doesn't contain the unique identifier and we are using the HTTP POST method. Let's inspect the request:

INDEX	chapter1
TYPE	product
DOCUMENT	JSON
HTTP METHOD	POST

The response from Elasticsearch is shown as follows:

{
   "_index": "chapter1",
   "_type": "product",
 "_id": "AVmKvtPwWuEuqke_aRsm",
   "_version": 1,
   "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
   },
 "created": true
 }

You can see from the response that Elasticsearch assigned the unique identifier AVmKvtPwWuEuqke_aRsm to the document and created flag is set to true. If a document with the same unique identifier already exists, Elasticsearch replaces the existing document and increments the document version. If you have to run the same PUT request from the beginning of the section, the response from Elasticsearch would be this:

{
   "_index": "chapter1",
   "_type": "product",
   "_id": "1",
"_version": 2,
   "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
   },
"created": false
 }

In the response, you can see that the created flag is false since the document with id: 1 already exists. Also, observe that the version is now 2.

Retrieving an existing document

To retrieve an existing document, we need the index, type and a unique identifier of the document. Let's try to retrieve the document we just indexed. To retrieve a document we need to use HTTP GET method as shown below:

GET http://localhost:9200/chapter1/product/1

Let's inspect the request:

INDEX	chapter1
TYPE	product
IDENTIFIER	1
HTTP METHOD	GET

Response from Elasticsearch as shown below contains the product document we indexed in the previous section:

{
   "_index": "chapter1",
   "_type": "product",
   "_id": "1",
   "_version": 2,
"found": true,
 "_source": {
     "title": "Learning Elasticsearch",
     "author": "Abhishek Andhavarapu",
     "category": "books"
   }
 }

The actual JSON document will be stored in the _source field. Also note the version in the response; every time the document is updated, the version is increased.

Updating an existing document

Updating a document in Elasticsearch is more complicated than in a traditional SQL database. Internally, Elasticsearch retrieves the old document, applies the changes, and re-inserts the document as a new document. The update operation is very expensive. There are different ways of updating a document. We will talk about updating a partial document here and in more detail in the Updating your data section in Chapter 4, Indexing and Updating Your Data.

Updating a partial document

We already indexed the document with the unique identifier 1, and now we need to update the category of the product from just books to technical books. We can update the document as shown here:

 POST http://localhost:9200/chapter1/product/1/_update
 {
 "doc": {
     "category": "technical books"
   }
 }

The body of the request is the field of the document we want to update and the unique identifier is passed in the URL.

Note

Please note the _update endpoint at the end of the URL.

The response from Elasticsearch is shown here:

{
   "_index": "chapter1",
   "_type": "product",
   "_id": "1",
"_version": 3,
   "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
   }
 }

As you can see in the response, the operation is successful, and the version of the document is now 3. More complicated update operations are possible using scripts and upserts.

Deleting an existing document

For creating and retrieving a document, we used the POST and GET methods. For deleting an existing document, we need to use the HTTP DELETE method and pass the unique identifier of the document in the URL as shown here:

DELETE http://localhost:9200/chapter1/product/1

Let's inspect the request:

INDEX	chapter1
TYPE	product
IDENTIFIER	1
HTTP METHOD	DELETE

The response from Elasticsearch is shown here:

{
"found": true,
   "_index": "chapter1",
   "_type": "product",
   "_id": "1",
   "_version": 4,
   "_shards": {
     "total": 1,
 "successful": 1,
     "failed": 0
   }
 }

In the response, you can see that Elasticsearch was able to find the document with the unique identifier 1 and was successful in deleting the document.