In article by Alberto Paro, the author of ElasticSearch Cookbook Second Edition, we will cover about the following recipes:

(For more resources related to this topic, see here.)

Installing additional script plugins

Managing scripts

Sorting data using scripts

Computing return fields with scripting

Filtering a search via scripting

Introduction

ElasticSearch has a powerful way of extending its capabilities with custom scripts, which can be written in several programming languages. The most common ones are Groovy, MVEL, JavaScript, and Python.

In this article, we will see how it's possible to create custom scoring algorithms, special processed return fields, custom sorting, and complex update operations on records.

The scripting concept of ElasticSearch can be seen as an advanced stored procedures system in the NoSQL world; so, for an advanced usage of ElasticSearch, it is very important to master it.

Installing additional script plugins

ElasticSearch provides native scripting (a Java code compiled in JAR) and Groovy, but a lot of interesting languages are also available, such as JavaScript and Python. In older ElasticSearch releases, prior to version 1.4, the official scripting language was MVEL, but due to the fact that it was not well-maintained by MVEL developers, in addition to the impossibility to sandbox it and prevent security issues, MVEL was replaced with Groovy. Groovy scripting is now provided by default in ElasticSearch. The other scripting languages can be installed as plugins.

Getting ready

You will need a working ElasticSearch cluster.

How to do it...

In order to install JavaScript language support for ElasticSearch (1.3.x), perform the following steps:

From the command line, simply enter the following command:

bin/plugin --install elasticsearch/elasticsearch-lang-javascript/2.3.0

This will print the following result:

-> Installing elasticsearch/elasticsearch-lang-javascript/2.3.0...
Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-lang-javascript/
elasticsearch-lang-javascript-2.3.0.zip...
Downloading ....DONE
Installed lang-javascript

If the installation is successful, the output will end with Installed; otherwise, an error is returned.

To install Python language support for ElasticSearch, just enter the following command:
```
bin/plugin -install elasticsearch/elasticsearch-lang-python/2.3.0
```
The version number depends on the ElasticSearch version. Take a look at the plugin's web page to choose the correct version.

How it works...

Language plugins allow you to extend the number of supported languages to be used in scripting.

During the ElasticSearch startup, an internal ElasticSearch service called PluginService loads all the installed language plugins.

In order to install or upgrade a plugin, you need to restart the node.

The ElasticSearch community provides common scripting languages (a list of the supported scripting languages is available on the ElasticSearch site plugin page at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html), and others are available in GitHub repositories (a simple search on GitHub allows you to find them).

The following are the most commonly used languages for scripting:

Groovy (http://groovy.codehaus.org/): This language is embedded in ElasticSearch by default. It is a simple language that provides scripting functionalities. This is one of the fastest available language extensions. Groovy is a dynamic, object-oriented programming language with features similar to those of Python, Ruby, Perl, and Smalltalk. It also provides support to write a functional code.

JavaScript (https://github.com/elasticsearch/elasticsearch-lang-javascript): This is available as an external plugin. The JavaScript implementation is based on Java Rhino (https://developer.mozilla.org/en-US/docs/Rhino) and is really fast.

Python (https://github.com/elasticsearch/elasticsearch-lang-python): This is available as an external plugin, based on Jython (http://jython.org). It allows Python to be used as a script engine. Considering several benchmark results, it's slower than other languages.

There's more...

Groovy is preferred if the script is not too complex; otherwise, a native plugin provides a better environment to implement complex logic and data management.

The performance of every language is different; the fastest one is the native Java. In the case of dynamic scripting languages, Groovy is faster, as compared to JavaScript and Python.

In order to access document properties in Groovy scripts, the same approach will work as in other scripting languages:

doc.score: This stores the document's score.

doc['field_name'].value: This extracts the value of the field_name field from the document. If the value is an array or if you want to extract the value as an array, you can use doc['field_name'].values.

doc['field_name'].empty: This returns true if the field_name field has no value in the document.

doc['field_name'].multivalue: This returns true if the field_name field contains multiple values.

If the field contains a geopoint value, additional methods are available, as follows:

doc['field_name'].lat: This returns the latitude of a geopoint. If you need the value as an array, you can use the doc['field_name'].lats method.

doc['field_name'].lon: This returns the longitude of a geopoint. If you need the value as an array, you can use the doc['field_name'].lons method.

doc['field_name'].distance(lat,lon): This returns the plane distance, in miles, from a latitude/longitude point. If you need to calculate the distance in kilometers, you should use the doc['field_name'].distanceInKm(lat,lon) method.

doc['field_name'].arcDistance(lat,lon): This returns the arc distance, in miles, from a latitude/longitude point. If you need to calculate the distance in kilometers, you should use the doc['field_name'].arcDistanceInKm(lat,lon) method.

doc['field_name'].geohashDistance(geohash): This returns the distance, in miles, from a geohash value. If you need to calculate the same distance in kilometers, you should use doc['field_name'] and the geohashDistanceInKm(lat,lon) method.

By using these helper methods, it is possible to create advanced scripts in order to boost a document by a distance that can be very handy in developing geolocalized centered applications.

Managing scripts

Depending on your scripting usage, there are several ways to customize ElasticSearch to use your script extensions.

In this recipe, we will see how to provide scripts to ElasticSearch via files, indexes, or inline.

Getting ready

You will need a working ElasticSearch cluster populated with the populate script (chapter_06/populate_aggregations.sh), available at https://github.com/aparo/ elasticsearch-cookbook-second-edition.

How to do it...

To manage scripting, perform the following steps:

Dynamic scripting is disabled by default for security reasons; we need to activate it in order to use dynamic scripting languages such as JavaScript or Python. To do this, we need to turn off the disable flag (script.disable_dynamic: false) in the ElasticSearch configuration file (config/elasticseach.yml) and restart the cluster.

To increase security, ElasticSearch does not allow you to specify scripts for non-sandbox languages. Scripts can be placed in the scripts directory inside the configuration directory. To provide a script in a file, we'll put a my_script.groovy script in the config/scripts location with the following code content:
```
doc["price"].value * factor
```

If the dynamic script is enabled (as done in the first step), ElasticSearch allows you to store the scripts in a special index, .scripts. To put my_script in the index, execute the following command in the command terminal:
```
curl -XPOST localhost:9200/_scripts/groovy/my_script -d '{
"script":"doc["price"].value * factor"
}'
```

The script can be used by simply referencing it in the script_id field; use the following command:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{
"query": {
   "match_all": {}
},
"sort": {
   "_script" : {
     "script_id" : "my_script",
     "lang" : "groovy",
     "type" : "number",
     "ignore_unmapped" : true,
     "params" : {
       "factor" : 1.1
     },
     "order" : "asc"
   }
}
}'

How it works...

ElasticSearch allows you to load your script in different ways; each one of these methods has their pros and cons.

The most secure way to load or import scripts is to provide them as files in the config/scripts directory. This directory is continuously scanned for new files (by default, every 60 seconds). The scripting language is automatically detected by the file extension, and the script name depends on the filename.

If the file is put in subdirectories, the directory path becomes part of the filename; for example, if it is config/scripts/mysub1/mysub2/my_script.groovy, the script name will be mysub1_mysub2_my_script. If the script is provided via a filesystem, it can be referenced in the code via the "script": "script_name" parameter.

Scripts can also be available in the special .script index. These are the REST end points:

To retrieve a script, use the following code:

GET http://<server>/_scripts/<language>/<id">

To store a script use the following code:

PUT http://<server>/_scripts/<language>/<id>

To delete a script use the following code:

DELETE http://<server>/_scripts/<language>/<id>

The indexed script can be referenced in the code via the "script_id": "id_of_the_script" parameter. The recipes that follow will use inline scripting because it's easier to use it during the development and testing phases.

Generally, a good practice is to develop using the inline dynamic scripting in a request, because it's faster to prototype. Once the script is ready and no changes are needed, it can be stored in the index since it is simpler to call and manage. In production, a best practice is to disable dynamic scripting and store the script on the disk (generally, dumping the indexed script to disk).

Sorting data using script

ElasticSearch provides scripting support for the sorting functionality. In real world applications, there is often a need to modify the default sort by the match score using an algorithm that depends on the context and some external variables. Some common scenarios are given as follows:

Sorting places near a point

Sorting by most-read articles

Sorting items by custom user logic

Sorting items by revenue

Getting ready

You will need a working ElasticSearch cluster and an index populated with the script, which is available at https://github.com/aparo/ elasticsearch-cookbook-second-edition.

How to do it...

In order to sort using scripting, perform the following steps:

If you want to order your documents by the price field multiplied by a factor parameter (that is, sales tax), the search will be as shown in the following code:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{
"query": {
   "match_all": {}
},
"sort": {
   "_script" : {
     "script" : "doc["price"].value * factor",
     "lang" : "groovy",
     "type" : "number",
     "ignore_unmapped" : true,
   "params" : {
       "factor" : 1.1
     },
           "order" : "asc"
       }
   }
}'

In this case, we have used a match_all query and a sort script.

If everything is correct, the result returned by ElasticSearch should be as shown in the following code:

{
"took" : 7,
"timed_out" : false,
"_shards" : {
   "total" : 5,
   "successful" : 5,
   "failed" : 0
},
"hits" : {
   "total" : 1000,
   "max_score" : null,
   "hits" : [ {
     "_index" : "test-index",
     "_type" : "test-type",
     "_id" : "161",
     "_score" : null, "_source" : … truncated …,
     "sort" : [ 0.0278578661440021 ]
   }, {
     "_index" : "test-index",
     "_type" : "test-type",
     "_id" : "634",
     "_score" : null, "_source" : … truncated …,
    "sort" : [ 0.08131364254827411 ]
   }, {
     "_index" : "test-index",
     "_type" : "test-type",
     "_id" : "465",
     "_score" : null, "_source" : … truncated …,
     "sort" : [ 0.1094966959069832 ]
   } ]
}
}

How it works...

The sort scripting allows you to define several parameters, as follows:

order (default "asc") ("asc" or "desc"): This determines whether the order must be ascending or descending.

script: This contains the code to be executed.

type: This defines the type to convert the value.

params (optional, a JSON object): This defines the parameters that need to be passed.

lang (by default, groovy): This defines the scripting language to be used.

ignore_unmapped (optional): This ignores unmapped fields in a sort. This flag allows you to avoid errors due to missing fields in shards.

Extending the sort with scripting allows the use of a broader approach to score your hits.

ElasticSearch scripting permits the use of every code that you want. You can create custom complex algorithms to score your documents.

There's more...

Groovy provides a lot of built-in functions (mainly taken from Java's Math class) that can be used in scripts, as shown in the following table:

Function	Description
time()	The current time in milliseconds
sin(a)	Returns the trigonometric sine of an angle
cos(a)	Returns the trigonometric cosine of an angle
tan(a)	Returns the trigonometric tangent of an angle
asin(a)	Returns the arc sine of a value
acos(a)	Returns the arc cosine of a value
atan(a)	Returns the arc tangent of a value
toRadians(angdeg)	Converts an angle measured in degrees to an approximately equivalent angle measured in radians
toDegrees(angrad)	Converts an angle measured in radians to an approximately equivalent angle measured in degrees
exp(a)	Returns Euler's number raised to the power of a value
log(a)	Returns the natural logarithm (base e) of a value
log10(a)	Returns the base 10 logarithm of a value
sqrt(a)	Returns the correctly rounded positive square root of a value
cbrt(a)	Returns the cube root of a double value
IEEEremainder(f1, f2)	Computes the remainder operation on two arguments, as prescribed by the IEEE 754 standard
ceil(a)	Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer
floor(a)	Returns the largest (closest to positive infinity) value that is less than or equal to the argument and is equal to a mathematical integer
rint(a)	Returns the value that is closest in value to the argument and is equal to a mathematical integer
atan2(y, x)	Returns the angle theta from the conversion of rectangular coordinates (x,y_) to polar coordinates (r,_theta)
pow(a, b)	Returns the value of the first argument raised to the power of the second argument
round(a)	Returns the closest integer to the argument
random()	Returns a random double value
abs(a)	Returns the absolute value of a value
max(a, b)	Returns the greater of the two values
min(a, b)	Returns the smaller of the two values
ulp(d)	Returns the size of the unit in the last place of the argument
signum(d)	Returns the signum function of the argument
sinh(x)	Returns the hyperbolic sine of a value
cosh(x)	Returns the hyperbolic cosine of a value
tanh(x)	Returns the hyperbolic tangent of a value
hypot(x,y)	Returns sqrt(x^2+y^2) without an intermediate overflow or underflow
acos(a)	Returns the arc cosine of a value
atan(a)	Returns the arc tangent of a value

Function

Description

time()

The current time in milliseconds

sin(a)

Returns the trigonometric sine of an angle

cos(a)

Returns the trigonometric cosine of an angle

tan(a)

Returns the trigonometric tangent of an angle

asin(a)

Returns the arc sine of a value

acos(a)

Returns the arc cosine of a value

atan(a)

Returns the arc tangent of a value

toRadians(angdeg)

Converts an angle measured in degrees to an approximately equivalent angle measured in radians

toDegrees(angrad)

Converts an angle measured in radians to an approximately equivalent angle measured in degrees

exp(a)

Returns Euler's number raised to the power of a value

log(a)

Returns the natural logarithm (base e) of a value

log10(a)

Returns the base 10 logarithm of a value

sqrt(a)

Returns the correctly rounded positive square root of a value

cbrt(a)

Returns the cube root of a double value

IEEEremainder(f1, f2)

Computes the remainder operation on two arguments, as prescribed by the IEEE 754 standard

ceil(a)

Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer

floor(a)

Returns the largest (closest to positive infinity) value that is less than or equal to the argument and is equal to a mathematical integer

rint(a)

Returns the value that is closest in value to the argument and is equal to a mathematical integer

atan2(y, x)

Returns the angle theta from the conversion of rectangular coordinates (x,y_) to polar coordinates (r,_theta)

pow(a, b)

Returns the value of the first argument raised to the power of the second argument

round(a)

Returns the closest integer to the argument

random()

Returns a random double value

abs(a)

Returns the absolute value of a value

max(a, b)

Returns the greater of the two values

min(a, b)

Returns the smaller of the two values

ulp(d)

Returns the size of the unit in the last place of the argument

signum(d)

Returns the signum function of the argument

sinh(x)

Returns the hyperbolic sine of a value

cosh(x)

Returns the hyperbolic cosine of a value

tanh(x)

Returns the hyperbolic tangent of a value

hypot(x,y)

Returns sqrt(x^2+y^2) without an intermediate overflow or underflow

acos(a)

Returns the arc cosine of a value

atan(a)

Returns the arc tangent of a value

If you want to retrieve records in a random order, you can use a script with a random method, as shown in the following code:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{
 "query": {
   "match_all": {}
 },
 "sort": {
   "_script" : {
     "script" : "Math.random()",
     "lang" : "groovy",
     "type" : "number",
     "params" : {}
   }
 }
}'

In this example, for every hit, the new sort value is computed by executing the Math.random() scripting function.

Computing return fields with scripting

ElasticSearch allows you to define complex expressions that can be used to return a new calculated field value. These special fields are called script_fields, and they can be expressed with a script in every available ElasticSearch scripting language.

Getting ready

You will need a working ElasticSearch cluster and an index populated with the script (chapter_06/populate_aggregations.sh), which is available at https://github.com/aparo/ elasticsearch-cookbook-second-edition.

How to do it...

In order to compute return fields with scripting, perform the following steps:

Return the following script fields:
- "my_calc_field": This concatenates the text of the "name" and "description" fields
- "my_calc_field2": This multiplies the "price" value by the "discount" parameter

From the command line, execute the following code:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/
_search?&pretty=true&size=3' -d '{
"query": {
   "match_all": {}
},
"script_fields" : {
   "my_calc_field" : {
     "script" : "doc["name"].value + " -- " + doc["description"].value"
   },
   "my_calc_field2" : {
     "script" : "doc["price"].value * discount",
     "params" : {
      "discount" : 0.8
     }
   }
}
}'

If everything works all right, this is how the result returned by ElasticSearch should be:

{
"took" : 4,
"timed_out" : false,
"_shards" : {
   "total" : 5,
   "successful" : 5,
   "failed" : 0
},
"hits" : {
   "total" : 1000,
   "max_score" : 1.0,
   "hits" : [ {
     "_index" : "test-index",
     "_type" : "test-type",
     "_id" : "4",
     "_score" : 1.0,
     "fields" : {
       "my_calc_field" : "entropic -- accusantium",
       "my_calc_field2" : 5.480038242170081
     }
   }, {
     "_index" : "test-index",
     "_type" : "test-type",
     "_id" : "9",
     "_score" : 1.0,
     "fields" : {
       "my_calc_field" : "frankie -- accusantium",
       "my_calc_field2" : 34.79852410178313
     }
   }, {
     "_index" : "test-index",
     "_type" : "test-type",
     "_id" : "11",
     "_score" : 1.0,
     "fields" : {
       "my_calc_field" : "johansson -- accusamus",
       "my_calc_field2" : 11.824173084636591
     }
   } ]
}
}

How it works...

The scripting fields are similar to executing an SQL function on a field during a select operation.

In ElasticSearch, after a search phase is executed and the hits to be returned are calculated, if some fields (standard or script) are defined, they are calculated and returned.

The script field, which can be defined with all the supported languages, is processed by passing a value to the source of the document and, if some other parameters are defined in the script (in the discount factor example), they are passed to the script function.

The script function is a code snippet; it can contain everything that the language allows you to write, but it must be evaluated to a value (or a list of values).

Filtering a search via scripting

ElasticSearch scripting allows you to extend the traditional filter with custom scripts. Using scripting to create a custom filter is a convenient way to write scripting rules that are not provided by Lucene or ElasticSearch, and to implement business logic that is not available in the query DSL.

Getting ready

You will need a working ElasticSearch cluster and an index populated with the (chapter_06/populate_aggregations.sh) script, which is available at https://github.com/aparo/ elasticsearch-cookbook-second-edition.

How to do it...

In order to filter a search using a script, perform the following steps:

Write a search with a filter that filters out a document with the value of age less than the parameter value:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{
"query": {
   "filtered": {
     "filter": {
       "script": {
         "script": "doc["age"].value > param1",
         "params" : {
           "param1" : 80
         }
       }
     },
     "query": {
       "match_all": {}
     }
   }
}
}'

In this example, all the documents in which the value of age is greater than param1 are qualified to be returned.

If everything works correctly, the result returned by ElasticSearch should be as shown here:

{
"took" : 30,
"timed_out" : false,
"_shards" : {
   "total" : 5,
   "successful" : 5,
   "failed" : 0
},
"hits" : {
   "total" : 237,
   "max_score" : 1.0,
   "hits" : [ {
     "_index" : "test-index",
     "_type" : "test-type",
     "_id" : "9",
     "_score" : 1.0, "_source" :{ … "age": 83, … }
   }, {
     "_index" : "test-index",
     "_type" : "test-type",
     "_id" : "23",
     "_score" : 1.0, "_source" : { … "age": 87, … }
   }, {
     "_index" : "test-index",
     "_type" : "test-type",
     "_id" : "47",
     "_score" : 1.0, "_source" : {…. "age": 98, …}
   } ]
}
}

How it works...

The script filter is a language script that returns a Boolean value (true/false). For every hit, the script is evaluated, and if it returns true, the hit passes the filter. This type of scripting can only be used as Lucene filters, not as queries, because it doesn't affect the search (the exceptions are constant_score and custom_filters_score).

These are the scripting fields:

script: This contains the code to be executed

params: These are optional parameters to be passed to the script

lang (defaults to groovy): This defines the language of the script

The script code can be any code in your preferred and supported scripting language that returns a Boolean value.

There's more...

Other languages are used in the same way as Groovy.

For the current example, I have chosen a standard comparison that works in several languages. To execute the same script using the JavaScript language, use the following code:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{
 "query": {
   "filtered": {
     "filter": {
       "script": {
         "script": "doc["age"].value > param1",
         "lang":"javascript",
         "params" : {
           "param1" : 80
         }
       }
     },
     "query": {
       "match_all": {}
     }
   }
 }
}'

For Python, use the following code:

curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{
 "query": {
   "filtered": {
     "filter": {
       "script": {
         "script": "doc["age"].value > param1",
         "lang":"python",
         "params" : {
           "param1" : 80
         }
       }
     },
     "query": {
       "match_all": {}
     }
   }
 }
}'

Summary

In this article you have learnt the ways you can use scripting to extend the ElasticSearch functional capabilities using different programming languages.