





















































In article by Alberto Paro, the author of ElasticSearch Cookbook Second Edition, we will cover about the following recipes:
(For more resources related to this topic, see here.)
ElasticSearch has a powerful way of extending its capabilities with custom scripts, which can be written in several programming languages. The most common ones are Groovy, MVEL, JavaScript, and Python.
In this article, we will see how it's possible to create custom scoring algorithms, special processed return fields, custom sorting, and complex update operations on records.
The scripting concept of ElasticSearch can be seen as an advanced stored procedures system in the NoSQL world; so, for an advanced usage of ElasticSearch, it is very important to master it.
ElasticSearch provides native scripting (a Java code compiled in JAR) and Groovy, but a lot of interesting languages are also available, such as JavaScript and Python. In older ElasticSearch releases, prior to version 1.4, the official scripting language was MVEL, but due to the fact that it was not well-maintained by MVEL developers, in addition to the impossibility to sandbox it and prevent security issues, MVEL was replaced with Groovy. Groovy scripting is now provided by default in ElasticSearch. The other scripting languages can be installed as plugins.
You will need a working ElasticSearch cluster.
In order to install JavaScript language support for ElasticSearch (1.3.x), perform the following steps:
bin/plugin --install elasticsearch/elasticsearch-lang-javascript/2.3.0
-> Installing elasticsearch/elasticsearch-lang-javascript/2.3.0... Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-lang-javascript/ elasticsearch-lang-javascript-2.3.0.zip... Downloading ....DONE Installed lang-javascript
If the installation is successful, the output will end with Installed; otherwise, an error is returned.
bin/plugin -install elasticsearch/elasticsearch-lang-python/2.3.0
The version number depends on the ElasticSearch version. Take a look at the plugin's web page to choose the correct version.
Language plugins allow you to extend the number of supported languages to be used in scripting.
During the ElasticSearch startup, an internal ElasticSearch service called PluginService loads all the installed language plugins.
In order to install or upgrade a plugin, you need to restart the node.
The ElasticSearch community provides common scripting languages (a list of the supported scripting languages is available on the ElasticSearch site plugin page at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html), and others are available in GitHub repositories (a simple search on GitHub allows you to find them).
The following are the most commonly used languages for scripting:
Groovy is preferred if the script is not too complex; otherwise, a native plugin provides a better environment to implement complex logic and data management.
The performance of every language is different; the fastest one is the native Java. In the case of dynamic scripting languages, Groovy is faster, as compared to JavaScript and Python.
In order to access document properties in Groovy scripts, the same approach will work as in other scripting languages:
If the field contains a geopoint value, additional methods are available, as follows:
By using these helper methods, it is possible to create advanced scripts in order to boost a document by a distance that can be very handy in developing geolocalized centered applications.
Depending on your scripting usage, there are several ways to customize ElasticSearch to use your script extensions.
In this recipe, we will see how to provide scripts to ElasticSearch via files, indexes, or inline.
You will need a working ElasticSearch cluster populated with the populate script (chapter_06/populate_aggregations.sh), available at https://github.com/aparo/ elasticsearch-cookbook-second-edition.
To manage scripting, perform the following steps:
doc["price"].value * factor
curl -XPOST localhost:9200/_scripts/groovy/my_script -d '{ "script":"doc["price"].value * factor" }'
curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": { "match_all": {} }, "sort": { "_script" : { "script_id" : "my_script", "lang" : "groovy", "type" : "number", "ignore_unmapped" : true, "params" : { "factor" : 1.1 }, "order" : "asc" } } }'
ElasticSearch allows you to load your script in different ways; each one of these methods has their pros and cons.
The most secure way to load or import scripts is to provide them as files in the config/scripts directory. This directory is continuously scanned for new files (by default, every 60 seconds). The scripting language is automatically detected by the file extension, and the script name depends on the filename.
If the file is put in subdirectories, the directory path becomes part of the filename; for example, if it is config/scripts/mysub1/mysub2/my_script.groovy, the script name will be mysub1_mysub2_my_script. If the script is provided via a filesystem, it can be referenced in the code via the "script": "script_name" parameter.
Scripts can also be available in the special .script index. These are the REST end points:
GET http://<server>/_scripts/<language>/<id">
PUT http://<server>/_scripts/<language>/<id>
DELETE http://<server>/_scripts/<language>/<id>
The indexed script can be referenced in the code via the "script_id": "id_of_the_script" parameter. The recipes that follow will use inline scripting because it's easier to use it during the development and testing phases.
Generally, a good practice is to develop using the inline dynamic scripting in a request, because it's faster to prototype. Once the script is ready and no changes are needed, it can be stored in the index since it is simpler to call and manage. In production, a best practice is to disable dynamic scripting and store the script on the disk (generally, dumping the indexed script to disk).
ElasticSearch provides scripting support for the sorting functionality. In real world applications, there is often a need to modify the default sort by the match score using an algorithm that depends on the context and some external variables. Some common scenarios are given as follows:
You will need a working ElasticSearch cluster and an index populated with the script, which is available at https://github.com/aparo/ elasticsearch-cookbook-second-edition.
In order to sort using scripting, perform the following steps:
curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": { "match_all": {} }, "sort": { "_script" : { "script" : "doc["price"].value * factor", "lang" : "groovy", "type" : "number", "ignore_unmapped" : true, "params" : { "factor" : 1.1 }, "order" : "asc" } } }'
In this case, we have used a match_all query and a sort script.
{ "took" : 7, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : null, "hits" : [ { "_index" : "test-index", "_type" : "test-type", "_id" : "161", "_score" : null, "_source" : … truncated …, "sort" : [ 0.0278578661440021 ] }, { "_index" : "test-index", "_type" : "test-type", "_id" : "634", "_score" : null, "_source" : … truncated …, "sort" : [ 0.08131364254827411 ] }, { "_index" : "test-index", "_type" : "test-type", "_id" : "465", "_score" : null, "_source" : … truncated …, "sort" : [ 0.1094966959069832 ] } ] } }
The sort scripting allows you to define several parameters, as follows:
Extending the sort with scripting allows the use of a broader approach to score your hits.
ElasticSearch scripting permits the use of every code that you want. You can create custom complex algorithms to score your documents.
Groovy provides a lot of built-in functions (mainly taken from Java's Math class) that can be used in scripts, as shown in the following table:
Function
|
Description
|
time()
|
The current time in milliseconds
|
sin(a)
|
Returns the trigonometric sine of an angle
|
cos(a)
|
Returns the trigonometric cosine of an angle
|
tan(a)
|
Returns the trigonometric tangent of an angle
|
asin(a)
|
Returns the arc sine of a value
|
acos(a)
|
Returns the arc cosine of a value
|
atan(a)
|
Returns the arc tangent of a value
|
toRadians(angdeg)
|
Converts an angle measured in degrees to an approximately equivalent angle measured in radians
|
toDegrees(angrad)
|
Converts an angle measured in radians to an approximately equivalent angle measured in degrees
|
exp(a)
|
Returns Euler's number raised to the power of a value
|
log(a)
|
Returns the natural logarithm (base e) of a value
|
log10(a)
|
Returns the base 10 logarithm of a value
|
sqrt(a)
|
Returns the correctly rounded positive square root of a value
|
cbrt(a)
|
Returns the cube root of a double value
|
IEEEremainder(f1, f2)
|
Computes the remainder operation on two arguments, as prescribed by the IEEE 754 standard
|
ceil(a)
|
Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer
|
floor(a)
|
Returns the largest (closest to positive infinity) value that is less than or equal to the argument and is equal to a mathematical integer
|
rint(a)
|
Returns the value that is closest in value to the argument and is equal to a mathematical integer
|
atan2(y, x)
|
Returns the angle theta from the conversion of rectangular coordinates (x,y_) to polar coordinates (r,_theta)
|
pow(a, b)
|
Returns the value of the first argument raised to the power of the second argument
|
round(a)
|
Returns the closest integer to the argument
|
random()
|
Returns a random double value
|
abs(a)
|
Returns the absolute value of a value
|
max(a, b)
|
Returns the greater of the two values
|
min(a, b)
|
Returns the smaller of the two values
|
ulp(d)
|
Returns the size of the unit in the last place of the argument
|
signum(d)
|
Returns the signum function of the argument
|
sinh(x)
|
Returns the hyperbolic sine of a value
|
cosh(x)
|
Returns the hyperbolic cosine of a value
|
tanh(x)
|
Returns the hyperbolic tangent of a value
|
hypot(x,y)
|
Returns sqrt(x^2+y^2) without an intermediate overflow or underflow
|
acos(a)
|
Returns the arc cosine of a value
|
atan(a)
|
Returns the arc tangent of a value
|
If you want to retrieve records in a random order, you can use a script with a random method, as shown in the following code:
curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": { "match_all": {} }, "sort": { "_script" : { "script" : "Math.random()", "lang" : "groovy", "type" : "number", "params" : {} } } }'
In this example, for every hit, the new sort value is computed by executing the Math.random() scripting function.
ElasticSearch allows you to define complex expressions that can be used to return a new calculated field value. These special fields are called script_fields, and they can be expressed with a script in every available ElasticSearch scripting language.
You will need a working ElasticSearch cluster and an index populated with the script (chapter_06/populate_aggregations.sh), which is available at https://github.com/aparo/ elasticsearch-cookbook-second-edition.
In order to compute return fields with scripting, perform the following steps:
curl -XGET 'http://127.0.0.1:9200/test-index/test-type/ _search?&pretty=true&size=3' -d '{ "query": { "match_all": {} }, "script_fields" : { "my_calc_field" : { "script" : "doc["name"].value + " -- " + doc["description"].value" }, "my_calc_field2" : { "script" : "doc["price"].value * discount", "params" : { "discount" : 0.8 } } } }'
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : 1.0, "hits" : [ { "_index" : "test-index", "_type" : "test-type", "_id" : "4", "_score" : 1.0, "fields" : { "my_calc_field" : "entropic -- accusantium", "my_calc_field2" : 5.480038242170081 } }, { "_index" : "test-index", "_type" : "test-type", "_id" : "9", "_score" : 1.0, "fields" : { "my_calc_field" : "frankie -- accusantium", "my_calc_field2" : 34.79852410178313 } }, { "_index" : "test-index", "_type" : "test-type", "_id" : "11", "_score" : 1.0, "fields" : { "my_calc_field" : "johansson -- accusamus", "my_calc_field2" : 11.824173084636591 } } ] } }
The scripting fields are similar to executing an SQL function on a field during a select operation.
In ElasticSearch, after a search phase is executed and the hits to be returned are calculated, if some fields (standard or script) are defined, they are calculated and returned.
The script field, which can be defined with all the supported languages, is processed by passing a value to the source of the document and, if some other parameters are defined in the script (in the discount factor example), they are passed to the script function.
The script function is a code snippet; it can contain everything that the language allows you to write, but it must be evaluated to a value (or a list of values).
ElasticSearch scripting allows you to extend the traditional filter with custom scripts. Using scripting to create a custom filter is a convenient way to write scripting rules that are not provided by Lucene or ElasticSearch, and to implement business logic that is not available in the query DSL.
You will need a working ElasticSearch cluster and an index populated with the (chapter_06/populate_aggregations.sh) script, which is available at https://github.com/aparo/ elasticsearch-cookbook-second-edition.
In order to filter a search using a script, perform the following steps:
curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": { "filtered": { "filter": { "script": { "script": "doc["age"].value > param1", "params" : { "param1" : 80 } } }, "query": { "match_all": {} } } } }'
In this example, all the documents in which the value of age is greater than param1 are qualified to be returned.
{ "took" : 30, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 237, "max_score" : 1.0, "hits" : [ { "_index" : "test-index", "_type" : "test-type", "_id" : "9", "_score" : 1.0, "_source" :{ … "age": 83, … } }, { "_index" : "test-index", "_type" : "test-type", "_id" : "23", "_score" : 1.0, "_source" : { … "age": 87, … } }, { "_index" : "test-index", "_type" : "test-type", "_id" : "47", "_score" : 1.0, "_source" : {…. "age": 98, …} } ] } }
The script filter is a language script that returns a Boolean value (true/false). For every hit, the script is evaluated, and if it returns true, the hit passes the filter. This type of scripting can only be used as Lucene filters, not as queries, because it doesn't affect the search (the exceptions are constant_score and custom_filters_score).
These are the scripting fields:
The script code can be any code in your preferred and supported scripting language that returns a Boolean value.
Other languages are used in the same way as Groovy.
For the current example, I have chosen a standard comparison that works in several languages. To execute the same script using the JavaScript language, use the following code:
curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": { "filtered": { "filter": { "script": { "script": "doc["age"].value > param1", "lang":"javascript", "params" : { "param1" : 80 } } }, "query": { "match_all": {} } } } }'
For Python, use the following code:
curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": { "filtered": { "filter": { "script": { "script": "doc["age"].value > param1", "lang":"python", "params" : { "param1" : 80 } } }, "query": { "match_all": {} } } } }'
In this article you have learnt the ways you can use scripting to extend the ElasticSearch functional capabilities using different programming languages.
Further resources on this subject: