A query is divided by Apache Lucene into terms and operators. A term, in Lucene, can be a single word or a phrase (group of words surrounded by double quote characters). If the query is set to be analyzed, the defined analyzer will be used on each of the terms that form the query.
A query can also contain Boolean operators that connect terms to each other forming clauses. The list of Boolean operators is as follows:
AND: It means that the given two terms (left and right operand) need to match in order for the clause to be matched. For example, we would run a query, such as apache AND lucene, to match documents with both apache and lucene terms in a document field.
OR: It means that any of the given terms may match in order for the clause to be matched. For example, we would run a query, such as apache OR lucene, to match documents with apache or lucene (or both) terms in a document field.
NOT: It means that in order for the document to be considered a match, the term appearing after the NOT operator must not match. For example, we would run a query lucene NOT Elasticsearch to match documents that contain lucene term, but not the Elasticsearch term in the document field.
In addition to these, we may use the following operators:
+: It means that the given term needs to be matched in order for the document to be considered as a match. For example, in order to find documents that match lucene term and may match apache term, we would run a query such as +lucene apache.
-: It means that the given term can't be matched in order for the document to be considered a match. For example, in order to find a document with lucene term, but not Elasticsearch term, we would run a query such as +lucene -Elasticsearch.
When not specifying any of the previous operators, the default OR operator will be used.
In addition to all these, there is one more thing: you can use parenthesis to group clauses together for example, with something like the following query:
Of course, just like in Elasticsearch, in Lucene all your data is stored in fields that build the document. In order to run a query against a field, you need to provide the field name, add the colon character, and provide the clause that should be run against that field. For example, if you would like to match documents with the term Elasticsearch in the title field, you would run the following query:
You can also group multiple clauses. For example, if you would like your query to match all the documents having the Elasticsearch term and the mastering book phrase in the title field, you could run a query like the following code:
The previous query can also be expressed in the following way:
In addition to the standard field query with a simple term or clause, Lucene allows us to modify the terms we pass in the query with modifiers. The most common modifiers, which you will be familiar with, are wildcards. There are two wildcards supported by Lucene, the ? and * terms. The first one will match any character and the second one will match multiple characters.
Note
Please note that by default these wildcard characters can't be used as the first character in a term because of performance reasons.
In addition to this, Lucene supports fuzzy and proximity searches with the use of the ~ character and an integer following it. When used with a single word term, it means that we want to search for terms that are similar to the one we've modified (the so-called fuzzy search). The integer after the ~ character specifies the maximum number of edits that can be done to consider the term similar. For example, if we would run a query, such as writer~2, both the terms writer and writers would be considered a match.
When the ~ character is used on a phrase, the integer number we provide is telling Lucene how much distance between the words is acceptable. For example, let's take the following query:
It would match the document with the title field containing mastering Elasticsearch, but not mastering book Elasticsearch. However, if we would run a query, such as title:"mastering Elasticsearch"~2, it would result in both example documents matched.
We can also use boosting to increase our term importance by using the ^ character and providing a float number. Boosts lower than one would result in decreasing the document importance. Boosts higher than one will result in increasing the importance. The default boost value is 1. Please refer to the Default Apache Lucene scoring explained section in Chapter 2, Power User Query DSL, for further information on what boosting is and how it is taken into consideration during document scoring.
In addition to all these, we can use square and curly brackets to allow range searching. For example, if we would like to run a range search on a numeric field, we could run the following query:
The preceding query would result in all documents with the price field between 10.00 and 15.00 inclusive.
In case of string-based fields, we also can run a range query, for example name:[Adam TO Adria].
The preceding query would result in all documents containing all the terms between Adam and Adria in the name field including them.
If you would like your range bound or bounds to be exclusive, use curly brackets instead of the square ones. For example, in order to find documents with the price field between 10.00 inclusive and 15.00 exclusive, we would run the following query:
If you would like your range bound from one side and not bound by the other, for example querying for documents with a price higher than 10.00, we would run the following query:
Handling special characters
In case you want to search for one of the special characters (which are +, -, &&, ||, !, (, ), { }, [ ], ^, ", ~, *, ?, :, \, /), you need to escape it with the use of the backslash (\) character. For example, to search for the abc"efg term you need to do something like abc\"efg.