





















































In this article, by James Miller, author of the book Mastering Splunk, we will discuss Splunk lookups and workflows. The topics that will be covered in this article are as follows:
(For more resources related to this topic, see here.)
Machines constantly generate data, usually in a raw form that is most efficient for processing by machines, but not easily understood by "human" data consumers. Splunk has the ability to identify unique identifiers and/or result or status codes within the data. This gives you the ability to enhance the readability of the data by adding descriptions or names as new search result fields. These fields contain information from an external source such as a static table (a CSV file) or the dynamic result of a Python command or a Python-based script.
Splunk's lookups can use information within returned events or time information to determine how to add other fields from your previously defined external data sources.
To illustrate, here is an example of a Splunk static lookup that:
So, if you have an event where the Business Unit value is equal to 999999, the lookup will add the Business Unit Name value as Corporate Office to that event.
More sophisticated lookups can:
Let's take a look at an example of a search pipeline that creates a table based on IBM Cognos TM1 file extractions:
sourcetype=csv 2014 "Current Forecast" "Direct" "513500" |
rename May as "Month" Actual as "Version" "FY 2012" as Year 650693NLR001 as "Business Unit" 100000
as "FCST" "09997_Eliminations Co 2" as "Account" "451200" as "Activity" | eval RFCST= round(FCST) |
Table Month, "Business Unit", RFCST
The following table shows the results generated:
Now, add the lookup command to our search pipeline to have Splunk convert Business Unit into Business Unit Name:
sourcetype=csv 2014 "Current Forecast" "Direct" "513500" |
rename May as "Month" Actual as "Version" "FY 2012" as Year 650693NLR001 as "Business Unit" 100000 as "FCST" "09997_Eliminations Co 2"
as "Account" "451200"as "Activity" | eval RFCST= round(FCST) |
lookup BUtoBUName BU as "Business Unit" OUTPUT BUName as "Business Unit Name" | Table Month, "Business Unit", "Business Unit Name", RFCST
The lookup command in our Splunk search pipeline will now add Business Unit Name in the results table:
In this section, we will configure a simple Splunk lookup.
You can set up a lookup using the Lookups page (in Splunk Web) or by configuring stanzas in the props.conf and transforms.conf files. Let's take the easier approach first and use the Splunk Web interface.
Before we begin, we need to establish our lookup table that will be in the form of an industry standard comma separated file (CSV). Our example is one that converts business unit codes to a more user-friendly business unit name. For example, we have the following information:
Business unit code
|
Business unit name
|
999999
|
Corporate office
|
VA0133SPS001
|
South-western
|
VA0133NLR001
|
North-east
|
685470NLR001
|
Mid-west
|
In the events data, only business unit codes are included. In an effort to make our Splunk search results more readable, we want to add the business unit name to our results table. To do this, we've converted our information (shown in the preceding table) to a CSV file (named BUtoBUName.csv):
For this example, we've kept our lookup table simple, but lookup tables (files) can be as complex as you need them to be. They can have numerous fields (columns) in them.
A Splunk lookup table has a few requirements, as follows:
Now, from Splunk Web, we can click on Settings and then select Lookups:
From the Lookups page, we can select Lookup table files:
From the Lookup table files page, we can add our new lookup file (BUtoBUName.csv):
By clicking on the New button, we see the Add new page where we can set up our file by doing the following:
Then, we click on Save:
Once you click on Save, you should receive the Successfully saved "BUtoBUName" in search" message:
In the previous screenshot, the lookup file is saved by default as private. You will need to adjust permissions to allow other Splunk users to use it.
Going back to the Lookups page, we can select Lookup definitions to see the Lookup definitions page:
In the Lookup definitions page, we can click on New to visit the Add new page (shown in the following screenshot) and set up our definition as follows:
Again, we should see the Successfully saved "BUtoBUName" in search message:
Now, our lookup is ready to be used:
Rather than having to code for a lookup in each of your Splunk searches, you have the ability to configure automatic lookups for a particular source type. To do this from Splunk Web, we can click on Settings and then select Lookups:
From the Lookups page, click on Automatic lookups:
In the Automatic lookups page, click on New:
In the Add New page, we will fill in the required information to set up our lookup:
The Splunk Add new page (shown in the following screenshot) is where you enter the lookup information (detailed in the previous section):
Once you have entered your automatic lookup information, you can click on Save and you will receive the Successfully saved "Business Unit to Business Unit Name" in search message:
Now, we can use the lookup in a search. For example, you can run a search with sourcetype=csv, as follows:
sourcetype=csv 2014 "Current Forecast" "Direct" "513500" |
rename May as "Month" Actual as "Version" "FY 2012" as Year 650693NLR001 as "Business Unit" 100000
as "FCST" "09997_Eliminations Co 2"as "Account" "451200" as "Activity" | eval RFCST= round(FCST) |
Table "Business Unit", "Business Unit Name", Month, RFCST
Notice in the following screenshot that Business Unit Name is converted to the user-friendly values from our lookup table, and we didn't have to add the lookup command to our search pipeline:
In addition to using the Splunk web interface, you can define and configure lookups using the following files:
To set up a lookup with these files (rather than using Splunk web), we can perform the following steps:
Whenever you edit a Splunk .conf file, always edit a local version, keeping the original (system directory version) intact.
In the current version of Splunk, there are two types of lookup tables: static and external. Static lookups use CSV files, and external (which are dynamic) lookups use Python scripting.
You have to decide if your lookup will be static (in a file) or dynamic (use script commands). If you are using a file, you'll use filename; if you are going to use a script, you use external_cmd (both will be set in the transforms.conf file). You can also limit the number of matching entries to apply to an event by setting the max_matches option (this tells Splunk to use the first <integer> (in file order) number of entries).
I've decided to leave the default for max_matches, so my transforms.conf file looks like the following:
[butobugroup]
filename = butobugroup.csv
It is okay to have multiple field lookups defined in one source lookup definition, but each lookup should have its own unique lookup name; for example, if you have multiple tables, you can name them LOOKUP-table01, LOOKUP-table02, and so on, or something perhaps more easily understood.
If you add a lookup to your props.conf file, this lookup is automatically applied to all events from searches that have matching source types (again, as mentioned earlier; if your automatic lookup is very slow, it will also impact the speed of your searches).
To illustrate the use of configuration files in order to implement an automatic lookup, let's use a simple example.
Once again, we want to convert a field from a unique identification code for an organization's business unit to a more user friendly descriptive name called BU Group. What we will do is match the field bu in a lookup table butobugroup.csv with a field in our events. Then, add the bugroup (description) to the returned events.
The following shows the contents of the butobugroup.csv file:
bu, bugroup
999999, leadership-group
VA0133SPS001, executive-group
650914FAC002, technology-group
You can put this file into $SPLUNK_HOME/etc/apps/<app_name>/lookups/ and carry out the following steps:
Put the butobugroup.csv file into $SPLUNK_HOME/etc/apps/search/lookups/, since we are using the search app.
[butobugroup]
filename = butobugroup.csv
[csv]
LOOKUP-check = butobugroup bu AS 650693NLR001 OUTPUT bugroup
You can (assuming you are logged in as an admin or have admin privileges) restart the Splunk server through the web interface by going to Settings, then select System and finally Server controls.
Now, you can run a search for sourcetype=csv (as shown here):
sourcetype=csv 2014 "Current Forecast" "Direct" "513500" |
rename May as "Month" ,650693NLR001 as "Business Unit" 100000 as "FCST"| eval RFCST= round(FCST) |
Table "Business Unit", "Business Unit Name", bugroup, Month, RFCST
You will see that the field bugroup can be returned as part of your event results:
Of course, you can create CSV files from external systems (or, perhaps even manually?), but from time to time, you might have the opportunity to create lookup CSV files (tables) from event data using Splunk. A handy command to accomplish this is outputcsv (which is covered in detail later in this article).
The following is a simple example of creating a CSV file from Splunk event data that can be used for a lookup table:
sourcetype=csv "Current Forecast" "Direct" | rename 650693NLR001 as "Business Unit" |
Table "Business Unit", "Business Unit Name", bugroup | outputcsv splunk_master
The results are shown in the following screeshot:
Of course, the output table isn't quite usable, since the results have duplicates. Therefore, we can rewrite the Splunk search pipeline introducing the dedup command (as shown here):
sourcetype=csv "Current Forecast" "Direct" | rename 650693NLR001 as "Business Unit" |
dedup "Business Unit" | Table "Business Unit", "Business Unit Name", bugroup | outputcsv splunk_master
Then, we can examine the results (now with more desirable results):
This command allows us to set the number of duplicate events to be kept based on the values of a field (in other words, we can use this command to drop duplicates from our event results for a selected field). The event returned for the dedup field will be the first event found (if you provide a number directly after the dedup command, it will be interpreted as the number of duplicate events to keep; if you don't specify a number, dedup keeps only the first occurring event and removes all consecutive duplicates).
The dedup command also lets you sort by field or list of fields. This will remove all the duplicates and then sort the results based on the specified sort-by field. Adding a sort in conjunction with the dedup command can affect the performance as Splunk performs the dedup operation and then sorts the results as a final step. Here is a search command using dedup:
sourcetype=csv "Current Forecast" "Direct" | rename 650693NLR001 as "Business Unit" |
dedup "Business Unit" sortby bugroup | Table "Business Unit", "Business Unit Name", bugroup | outputcsv splunk_master
The result of the preceding command is shown in the following screenshot:
Now, we have our CSV lookup file (outputcsv splunk_master) generated and ready to be used:
Look for your generated output file in $SPLUNK_HOME/var/run/splunk.
With a Splunk static lookup, your search reads through a file (a table) that was created or updated prior to executing the search. With dynamic lookups, the file is created at the time the search executes. This is possible because Splunk has the ability to execute an external command or script as part of your Splunk search.
At the time of writing this book, Splunk only directly supports Python scripts for external lookups. If you are not familiar with Python, its implementation began in 1989 and is a widely used general-purpose, high-level programming language, which is often used as a scripting language (but is also used in a wide range of non-scripting contexts).
Keep in mind that any external resources (such as a file) or scripts that you want to use with your lookup will need to be copied to a location where Splunk can find it. These locations are:
The following sections describe the process of using the dynamic lookup example script that ships with Splunk (external_lookup.py).
Just like with static lookups, Splunk makes it easy to define a dynamic or external lookup using the Splunk web interface. First, click on Settings and then select Lookups:
On the Lookups page, we can select Lookup table files to define a CSV file that contains the input file for our Python script. In the Add new page, we enter the following information:
The Add new page is shown in the following screenshot:
Now, click on Save. The lookup file (shown in the following screenshot) is a text CSV file that needs to (at a minimum) contain the two field names that the Python (py) script accepts as arguments, in this case, host and ip. As mentioned earlier, this file needs to be copied to $SPLUNK_HOME/etc/apps/<app_name>/bin.
Next, from the Lookups page, select Lookup definitions and then click on New. This is where you define your external lookup. Enter the following information:
The following screenshot describes a new lookup definition:
Now, click on Save.
Again, just like with static lookups in Splunk, dynamic lookups can also be configured in the Splunk transforms.conf file:
[myLookup]
external_cmd = external_lookup.py host ip
external_type = python
fields_list = host, ip
max_matches = 200
Let's learn more about the terms here:
The next step is to modify the props.conf file, as follows:
[mylookup]
LOOKUP-rdns = dnslookup host ip OUTPUT ip
After updating the Splunk configuration files, you will need to restart Splunk.
The external lookup example given uses a Python (py) script named external_lookup.py, which is a DNS lookup script that can return an IP address for a given host name or a host name for a provided IP address.
The lookup table field in this example is named ip, so Splunk will mine all of the IP addresses found in the indexed logs' events and add the values of ip from the lookup table into the ip field in the search events. We can notice the following:
Consider the following search command:
sourcetype=tm1* | lookup dnslookup host | table host, ip
When you run this command, Splunk uses the lookup table to pass the values for the host field as a CSV file (the text CSV file we looked at earlier) into the external command script. The py script then outputs the results (with both the host and ip fields populated) and returns it to Splunk, which populates the ip field in a result table:
Output of the py script with both the host and ip fields populated
If your lookup table has a field value that represents time, you can use the time field to set up a Splunk fields lookup. As mentioned earlier, the Splunk transforms.conf file can be modified to add a lookup stanza.
For example, the following screenshot shows a file named MasteringDCHP.csv:
You can add the following code to the transforms.conf file:
[MasteringDCHP]
filename = MasteringDCHP.csv
time_field = TimeStamp
time_format = %d/%m/%y %H:%M:%S $p
max_offset_secs = <integer>
min_offset_secs = <integer>
The file parameters are defined as follows:
Be careful with the preceding values; the offset relates to the timestamp in your lookup (CSV) file. Setting a tight (small) offset range might reduce the effectiveness of your lookup results!
The last step will be to restart Splunk.
Again, it's a lot easier to use the Splunk Web interface to set up our lookup. Here is the step-by-step process:
From Settings, select Lookups, and then Lookup table files:
You should receive the Successfully saved "MasterDHCP" in search message:
Now, we are ready to try our search:
sourcetype=dh* | Lookup MasterDHCP IP as "IP" | table DHCPTimeStamp, IP, UserId | sort UserId
The following screenshot shows the output:
Lookup table definitions are indicated with the attribute LOOKUP-<class> in the Splunk configuration file, props.conf, or in the web interface under Settings | Lookups | Lookup definitions.
If you use the Splunk Web interface (which we've demonstrated throughout this article) to set up or define your lookup table definitions, Splunk will prevent you from creating duplicate table names, as shown in the following screenshot:
However, if you define your lookups using the configuration settings, it is important to try and keep your table definition names unique. If you do give the same name to multiple lookups, the following rules apply:
If you have defined lookups with the same stanza (that is, using the same host, source, or source type), the first defined lookup in the configuration file wins and overrides all others. If lookups have different stanzas but overlapping events, the following logic is used by Splunk:
It is a proven practice recommendation to make sure that all of your lookup stanzas have unique names.
This section lists several important Splunk commands you will use when working with lookups.
The Splunk lookup command is used to manually invoke field lookups using a Splunk lookup table that is previously defined. You can use Splunk Web (or the transforms.conf file) to define your lookups.
If you do not specify OUTPUT or OUTPUTNEW, all fields in the lookup table (excluding the lookup match field) will be used by Splunk as output fields. Conversely, if OUTPUT is specified, the output lookup fields will overwrite existing fields and if OUTPUTNEW is specified, the lookup will not be performed for events in which the output fields already exist.
For example, if you have a lookup table specified as iptousername with (at least) two fields, IP and UserId, for each event, Splunk will look up the value of the field IP in the table and for any entries that match, the value of the UserId field in the lookup table will be written to the field user_name in the event. The query is as follows:
... Lookup iptousernameIP as "IP" output UserId as user_name
Always strive to perform lookups after any reporting commands in your search pipeline, so that the lookup only needs to match the results of the reporting command and not every individual event.
The inputlookup command allows you to load search results from a specified static lookup table. It reads in a specified CSV filename (or a table name as specified by the stanza name in transforms.conf). If the append=t (that is, true) command is added, the data from the lookup file is appended to the current set of results (instead of replacing it). The outputlookup command then lets us write the results' events to a specified static lookup table (as long as this output lookup table is defined).
So, here is an example of reading in the MasterDHCP lookup table (as specified in transforms.conf) and writing these event results to the lookup table definition NewMasterDHCP:
| inputlookup MasterDHCP | outputlookup NewMasterDHCP
After running the preceding command, we can see the following output:
Note that we can add the append=t command to the search in the following fashion:
| inputlookup MasterDHCP.csv | inputlookup NewMasterDHCP.csv append=t |
The inputcsv command is similar to the inputlookup command; in this, it loads search results, but this command loads from a specified CSV file. The filename must refer to a relative path in $SPLUNK_HOME/var/run/splunk and if the specified file does not exist and the filename did not have an extension, then a filename with a .csv extension is assumed. The outputcsv command lets us write our result events to a CSV file.
Here is an example where we read in a CSV file named splunk_master.csv, search for the text phrase FPM, and then write any matching events to a CSV file named FPMBU.csv:
| inputcsv splunk_master.csv | search "Business Unit Name"="FPM" | outputcsv FPMBU.csv
The following screenshot shows the results from the preceding search command:
The following screenshot shows the resulting file generated as a result of the preceding command:
Here is another example where we read in the same CSV file (splunk_master.csv) and write out only events from 51 to 500:
| inputcsv splunk_master start=50 max=500
Events are numbered starting with zero as the first entry (rather than 1).
In this article, we defined Splunk lookups and discussed their value. We also went through the two types of lookups, static and dynamic, and saw detailed, working examples of each. Various Splunk commands typically used with the lookup functionality were also presented.
Further resources on this subject: