In this article, by James Miller, author of the book Mastering Splunk, we will discuss Splunk lookups and workflows. The topics that will be covered in this article are as follows:

The value of a lookup

Design lookups

File lookups

Script lookups

(For more resources related to this topic, see here.)

Lookups

Machines constantly generate data, usually in a raw form that is most efficient for processing by machines, but not easily understood by "human" data consumers. Splunk has the ability to identify unique identifiers and/or result or status codes within the data. This gives you the ability to enhance the readability of the data by adding descriptions or names as new search result fields. These fields contain information from an external source such as a static table (a CSV file) or the dynamic result of a Python command or a Python-based script.

Splunk's lookups can use information within returned events or time information to determine how to add other fields from your previously defined external data sources.

To illustrate, here is an example of a Splunk static lookup that:

Uses the Business Unit value in an event

Matches this value with the organization's business unit name in a CSV file

Adds the definition to the event (as the Business Unit Name field)

So, if you have an event where the Business Unit value is equal to 999999, the lookup will add the Business Unit Name value as Corporate Office to that event.

More sophisticated lookups can:

Populate a static lookup table from the results of a report.

Use a Python script (rather than a lookup table) to define a field. For example, a lookup can use a script to return a server name when given an IP address.

Perform a time-based lookup if your lookup table includes a field value that represents time.

Let's take a look at an example of a search pipeline that creates a table based on IBM Cognos TM1 file extractions:

sourcetype=csv 2014 "Current Forecast" "Direct" "513500" |
rename May as "Month" Actual as "Version" "FY 2012" as Year 650693NLR001 as "Business Unit" 100000 
as "FCST" "09997_Eliminations Co 2" as "Account" "451200" as "Activity" | eval RFCST= round(FCST) |
Table Month, "Business Unit", RFCST

The following table shows the results generated:

lookups-img-0

Now, add the lookup command to our search pipeline to have Splunk convert Business Unit into Business Unit Name:

sourcetype=csv 2014 "Current Forecast" "Direct" "513500" |
rename May as "Month" Actual as "Version" "FY 2012" as Year 650693NLR001 as "Business Unit" 100000 as "FCST" "09997_Eliminations Co 2" 
as "Account" "451200"as "Activity" | eval RFCST= round(FCST) |
lookup BUtoBUName BU as "Business Unit" OUTPUT BUName as "Business Unit Name" | Table Month, "Business Unit", "Business Unit Name", RFCST

The lookup command in our Splunk search pipeline will now add Business Unit Name in the results table:

lookups-img-1

Configuring a simple field lookup

In this section, we will configure a simple Splunk lookup.

Defining lookups in Splunk Web

You can set up a lookup using the Lookups page (in Splunk Web) or by configuring stanzas in the props.conf and transforms.conf files. Let's take the easier approach first and use the Splunk Web interface.

Before we begin, we need to establish our lookup table that will be in the form of an industry standard comma separated file (CSV). Our example is one that converts business unit codes to a more user-friendly business unit name. For example, we have the following information:

Business unit code

Business unit name

999999

Corporate office

VA0133SPS001

South-western

VA0133NLR001

North-east

685470NLR001

Mid-west

In the events data, only business unit codes are included. In an effort to make our Splunk search results more readable, we want to add the business unit name to our results table. To do this, we've converted our information (shown in the preceding table) to a CSV file (named BUtoBUName.csv):

lookups-img-2

For this example, we've kept our lookup table simple, but lookup tables (files) can be as complex as you need them to be. They can have numerous fields (columns) in them.

A Splunk lookup table has a few requirements, as follows:

A table must contain a minimum of two columns

Each of the columns in the table can have duplicate values

You should use (plain) ASCII text and not non-UTF-8 characters

Now, from Splunk Web, we can click on Settings and then select Lookups:

lookups-img-3

From the Lookups page, we can select Lookup table files:

lookups-img-4

From the Lookup table files page, we can add our new lookup file (BUtoBUName.csv):

lookups-img-5

By clicking on the New button, we see the Add new page where we can set up our file by doing the following:

Select a Destination app (this is a drop-down list and you should select Search).

Enter (or browse to) our file under Upload a lookup file.

Provide a Destination filename.

Then, we click on Save:

lookups-img-6

Once you click on Save, you should receive the Successfully saved "BUtoBUName" in search" message:

lookups-img-7

In the previous screenshot, the lookup file is saved by default as private. You will need to adjust permissions to allow other Splunk users to use it.

Going back to the Lookups page, we can select Lookup definitions to see the Lookup definitions page:

lookups-img-8

In the Lookup definitions page, we can click on New to visit the Add new page (shown in the following screenshot) and set up our definition as follows:

Destination app: The lookup will be part of the Splunk search app

Name: Our file is BUtoBUName

Type: Here, we will select File-based

Lookup file: The filename is ButoBUName.csv, which we uploaded without the .csv suffix

Again, we should see the Successfully saved "BUtoBUName" in search message:

lookups-img-10

Now, our lookup is ready to be used:

lookups-img-11

Automatic lookups

Rather than having to code for a lookup in each of your Splunk searches, you have the ability to configure automatic lookups for a particular source type. To do this from Splunk Web, we can click on Settings and then select Lookups:

lookups-img-12

From the Lookups page, click on Automatic lookups:

lookups-img-13

In the Automatic lookups page, click on New:

lookups-img-14

In the Add New page, we will fill in the required information to set up our lookup:

Destination app: For this field, some options are framework, launcher, learned, search, and splunk_datapreview (for our example, select search).

Name: This provide a user-friendly name that describes this automatic lookup.

Lookup table: This is the name of the lookup table you defined with a CSV file (discussed earlier in this article).

Apply to: This is the type that you want this automatic lookup to apply to. The options are sourcetype, source, or host (I've picked sourcetype).

Named: This is the name of the type you picked under Apply to. I want my automatic search to apply for all searches with the sourcetype of csv.

Lookup input fields: This is simple in my example. In my lookup table, the field to be searched on will be BU and the = field value will be the field in the event results that I am converting; in my case, it was the field 650693NLR001.

Lookup output fields: This will be the field in the lookup table that I am using to convert to, which in my example is BUName and I want to call it Business Unit Name, so this becomes the = field value.

Overwrite field values: This is a checkbox where you can tell Splunk to overwrite existing values in your output fields—I checked it.

The Add new page

The Splunk Add new page (shown in the following screenshot) is where you enter the lookup information (detailed in the previous section):

lookups-img-15

Once you have entered your automatic lookup information, you can click on Save and you will receive the Successfully saved "Business Unit to Business Unit Name" in search message:

lookups-img-16

Now, we can use the lookup in a search. For example, you can run a search with sourcetype=csv, as follows:

sourcetype=csv 2014 "Current Forecast" "Direct" "513500" |
rename May as "Month" Actual as "Version" "FY 2012" as Year 650693NLR001 as "Business Unit" 100000 
as "FCST" "09997_Eliminations Co 2"as "Account" "451200" as "Activity" | eval RFCST= round(FCST) |
Table "Business Unit", "Business Unit Name", Month, RFCST

Notice in the following screenshot that Business Unit Name is converted to the user-friendly values from our lookup table, and we didn't have to add the lookup command to our search pipeline:

lookups-img-17

Configuration files

In addition to using the Splunk web interface, you can define and configure lookups using the following files:

props.conf

transforms.conf

To set up a lookup with these files (rather than using Splunk web), we can perform the following steps:

Edit transforms.conf to define the lookup table. The first step is to edit the transforms.conf configuration file to add the new lookup reference. Although the file exists in the Splunk default folder ($SPLUNK_HOME/etc/system/default), you should edit the file in $SPLUNK_HOME/etc/system/local/ or $SPLUNK_HOME/etc/apps/<app_name>/local/ (if the file doesn't exist here, create it).

Whenever you edit a Splunk .conf file, always edit a local version, keeping the original (system directory version) intact.

In the current version of Splunk, there are two types of lookup tables: static and external. Static lookups use CSV files, and external (which are dynamic) lookups use Python scripting.

You have to decide if your lookup will be static (in a file) or dynamic (use script commands). If you are using a file, you'll use filename; if you are going to use a script, you use external_cmd (both will be set in the transforms.conf file). You can also limit the number of matching entries to apply to an event by setting the max_matches option (this tells Splunk to use the first <integer> (in file order) number of entries).

I've decided to leave the default for max_matches, so my transforms.conf file looks like the following:
```
[butobugroup]
filename = butobugroup.csv
```

This step is optional. Edit props.conf to apply your lookup table automatically. For both static and external lookups, you stipulate the fields you want to match in the configuration file and the output from the lookup table that you defined in your transforms.conf file.
It is okay to have multiple field lookups defined in one source lookup definition, but each lookup should have its own unique lookup name; for example, if you have multiple tables, you can name them LOOKUP-table01, LOOKUP-table02, and so on, or something perhaps more easily understood.

If you add a lookup to your props.conf file, this lookup is automatically applied to all events from searches that have matching source types (again, as mentioned earlier; if your automatic lookup is very slow, it will also impact the speed of your searches).

Restart Splunk to see your changes.

Implementing a lookup using configuration files – an example

To illustrate the use of configuration files in order to implement an automatic lookup, let's use a simple example.

Once again, we want to convert a field from a unique identification code for an organization's business unit to a more user friendly descriptive name called BU Group. What we will do is match the field bu in a lookup table butobugroup.csv with a field in our events. Then, add the bugroup (description) to the returned events.

The following shows the contents of the butobugroup.csv file:

bu, bugroup

999999, leadership-group
VA0133SPS001, executive-group
650914FAC002, technology-group

You can put this file into $SPLUNK_HOME/etc/apps/<app_name>/lookups/ and carry out the following steps:

Put the butobugroup.csv file into $SPLUNK_HOME/etc/apps/search/lookups/, since we are using the search app.

As we mentioned earlier, we edit the transforms.conf file located at either $SPLUNK_HOME/etc/system/local/ or $SPLUNK_HOME/etc/apps/<app_name>/local/. We add the following two lines:
```
[butobugroup]
filename = butobugroup.csv
```

Next, as mentioned earlier in this article, we edit the props.conf file located at either $SPLUNK_HOME/etc/system/local/ or $SPLUNK_HOME/etc/apps/<app_name>/local/. Here, we add the following two lines:
```
[csv]
LOOKUP-check = butobugroup bu AS 650693NLR001 OUTPUT bugroup
```

Restart the Splunk server.

You can (assuming you are logged in as an admin or have admin privileges) restart the Splunk server through the web interface by going to Settings, then select System and finally Server controls.

Now, you can run a search for sourcetype=csv (as shown here):

sourcetype=csv 2014 "Current Forecast" "Direct" "513500" |
rename May as "Month" ,650693NLR001 as "Business Unit" 100000 as "FCST"| eval RFCST= round(FCST) |
Table "Business Unit", "Business Unit Name", bugroup, Month, RFCST

You will see that the field bugroup can be returned as part of your event results:

lookups-img-18

Populating lookup tables

Of course, you can create CSV files from external systems (or, perhaps even manually?), but from time to time, you might have the opportunity to create lookup CSV files (tables) from event data using Splunk. A handy command to accomplish this is outputcsv (which is covered in detail later in this article).

The following is a simple example of creating a CSV file from Splunk event data that can be used for a lookup table:

sourcetype=csv "Current Forecast" "Direct" | rename 650693NLR001 as "Business Unit" | 
Table "Business Unit", "Business Unit Name", bugroup | outputcsv splunk_master

The results are shown in the following screeshot:

lookups-img-19

Of course, the output table isn't quite usable, since the results have duplicates. Therefore, we can rewrite the Splunk search pipeline introducing the dedup command (as shown here):

sourcetype=csv   "Current Forecast" "Direct"   | rename 650693NLR001 as "Business Unit" | 
dedup "Business Unit" | Table "Business Unit", "Business Unit Name", bugroup | outputcsv splunk_master

Then, we can examine the results (now with more desirable results):

lookups-img-20

Handling duplicates with dedup

This command allows us to set the number of duplicate events to be kept based on the values of a field (in other words, we can use this command to drop duplicates from our event results for a selected field). The event returned for the dedup field will be the first event found (if you provide a number directly after the dedup command, it will be interpreted as the number of duplicate events to keep; if you don't specify a number, dedup keeps only the first occurring event and removes all consecutive duplicates).

The dedup command also lets you sort by field or list of fields. This will remove all the duplicates and then sort the results based on the specified sort-by field. Adding a sort in conjunction with the dedup command can affect the performance as Splunk performs the dedup operation and then sorts the results as a final step. Here is a search command using dedup:

sourcetype=csv   "Current Forecast" "Direct"   | rename 650693NLR001 as "Business Unit" | 
dedup "Business Unit" sortby bugroup | Table "Business Unit", "Business Unit Name", bugroup | outputcsv splunk_master

The result of the preceding command is shown in the following screenshot:

lookups-img-21

Now, we have our CSV lookup file (outputcsv splunk_master) generated and ready to be used:

lookups-img-22

Look for your generated output file in $SPLUNK_HOME/var/run/splunk.

Dynamic lookups

With a Splunk static lookup, your search reads through a file (a table) that was created or updated prior to executing the search. With dynamic lookups, the file is created at the time the search executes. This is possible because Splunk has the ability to execute an external command or script as part of your Splunk search.

At the time of writing this book, Splunk only directly supports Python scripts for external lookups. If you are not familiar with Python, its implementation began in 1989 and is a widely used general-purpose, high-level programming language, which is often used as a scripting language (but is also used in a wide range of non-scripting contexts).

Keep in mind that any external resources (such as a file) or scripts that you want to use with your lookup will need to be copied to a location where Splunk can find it. These locations are:

$SPLUNK_HOME/etc/apps/<app_name>/bin

$SPLUNK_HOME/etc/searchscripts

The following sections describe the process of using the dynamic lookup example script that ships with Splunk (external_lookup.py).

Using Splunk Web

Just like with static lookups, Splunk makes it easy to define a dynamic or external lookup using the Splunk web interface. First, click on Settings and then select Lookups:

lookups-img-23

On the Lookups page, we can select Lookup table files to define a CSV file that contains the input file for our Python script. In the Add new page, we enter the following information:

Destination app: For this field, select Search

Upload a lookup file: Here, you can browse to the filename (my filename is dnsLookup.csv)

Destination filename: Here, enter dnslookup

The Add new page is shown in the following screenshot:

lookups-img-24

Now, click on Save. The lookup file (shown in the following screenshot) is a text CSV file that needs to (at a minimum) contain the two field names that the Python (py) script accepts as arguments, in this case, host and ip. As mentioned earlier, this file needs to be copied to $SPLUNK_HOME/etc/apps/<app_name>/bin.

lookups-img-25

Next, from the Lookups page, select Lookup definitions and then click on New. This is where you define your external lookup. Enter the following information:

Type: For this, select External (as this lookup will run an external script)

Command: For this, enter external_lookup.py host ip (this is the name of the py script and its two arguments)

Supported fields: For this, enter host, ip (this indicates the two script input field names)

The following screenshot describes a new lookup definition:

lookups-img-26

Now, click on Save.

Using configuration files instead of Splunk Web

Again, just like with static lookups in Splunk, dynamic lookups can also be configured in the Splunk transforms.conf file:

[myLookup]
external_cmd = external_lookup.py host ip
external_type = python
fields_list = host, ip
max_matches = 200

Let's learn more about the terms here:

[myLookup]: This is the report stanza.

external_cmd: This is the actual runtime command definition. Here, it executes the Python (py) script external_lookup, which requires two arguments (or parameters), host and ip.

external_type (optional): This indicates that this is a Python script. Although this is an optional entry in the transform.conf file, it's a good habit to include this for readability and support.

fields_list: This lists all the fields supported by the external command or script, delimited by a comma and space.

The next step is to modify the props.conf file, as follows:

[mylookup]
LOOKUP-rdns = dnslookup host ip OUTPUT ip

After updating the Splunk configuration files, you will need to restart Splunk.

External lookups

The external lookup example given uses a Python (py) script named external_lookup.py, which is a DNS lookup script that can return an IP address for a given host name or a host name for a provided IP address.

Explanation

The lookup table field in this example is named ip, so Splunk will mine all of the IP addresses found in the indexed logs' events and add the values of ip from the lookup table into the ip field in the search events. We can notice the following:

If you look at the py script, you will notice that the example uses an MS Windows supported socket.gethostbyname_ex(host) function

The host field has the same name in the lookup table and the events, so you don't need to do anything else

Consider the following search command:

sourcetype=tm1* | lookup dnslookup host | table host, ip

When you run this command, Splunk uses the lookup table to pass the values for the host field as a CSV file (the text CSV file we looked at earlier) into the external command script. The py script then outputs the results (with both the host and ip fields populated) and returns it to Splunk, which populates the ip field in a result table:

lookups-img-27

Output of the py script with both the host and ip fields populated

Time-based lookups

If your lookup table has a field value that represents time, you can use the time field to set up a Splunk fields lookup. As mentioned earlier, the Splunk transforms.conf file can be modified to add a lookup stanza.

For example, the following screenshot shows a file named MasteringDCHP.csv:

lookups-img-28

You can add the following code to the transforms.conf file:

[MasteringDCHP]
filename = MasteringDCHP.csv
time_field = TimeStamp
time_format = %d/%m/%y %H:%M:%S $p
max_offset_secs = <integer>
min_offset_secs = <integer>

The file parameters are defined as follows:

[MasteringDCHP]: This is the report stanza

filename: This is the name of the CSV file to be used as the lookup table

time_field: This is the field in the file that contains the time information and is to be used as the timestamp

time_format: This indicates what format the time field is in

max_offset_secs and min_offset_secs: This indicates min/max amount of offset time for an event to occur after a lookup entry

Be careful with the preceding values; the offset relates to the timestamp in your lookup (CSV) file. Setting a tight (small) offset range might reduce the effectiveness of your lookup results!

The last step will be to restart Splunk.

An easier way to create a time-based lookup

Again, it's a lot easier to use the Splunk Web interface to set up our lookup. Here is the step-by-step process:

From Settings, select Lookups, and then Lookup table files:

In the Lookup table files page, click on New, configure our lookup file, and then click on Save:

You should receive the Successfully saved "MasterDHCP" in search message:

Next, select Lookup definitions and from this page, click on New:

In the Add new page, we define our lookup table with the following information:
- Destination app: For this, select search from the drop-down list
- Name: For this, enter MasterDHCP (this is the name you'll use in your lookup)
- Type: For this, select File-based (as this lookup table definition is a CSV file)
- Lookup file: For this, select the name of the file to be used from the drop-down list (ours is MasteringDCHP)
- Configure time-based lookup: Check this checkbox
- Name of time field: For this, enter TimeStamp (this is the field name in our file that contains the time information)
- Time format: For this, enter the string to describe to Splunk the format of our time field (our field uses this format: %d%m%y %H%M%S)

You can leave the rest blank and click on Save.

You should receive the Successfully saved "MasterDHCP" in search message: lookups-img-34

Now, we are ready to try our search:

sourcetype=dh* | Lookup MasterDHCP IP as "IP" | table DHCPTimeStamp, IP, UserId | sort UserId

The following screenshot shows the output:

lookups-img-35

Seeing double?

Lookup table definitions are indicated with the attribute LOOKUP-<class> in the Splunk configuration file, props.conf, or in the web interface under Settings | Lookups | Lookup definitions.

If you use the Splunk Web interface (which we've demonstrated throughout this article) to set up or define your lookup table definitions, Splunk will prevent you from creating duplicate table names, as shown in the following screenshot:

lookups-img-36

However, if you define your lookups using the configuration settings, it is important to try and keep your table definition names unique. If you do give the same name to multiple lookups, the following rules apply:

If you have defined lookups with the same stanza (that is, using the same host, source, or source type), the first defined lookup in the configuration file wins and overrides all others. If lookups have different stanzas but overlapping events, the following logic is used by Splunk:

Events that match the host get the host lookup

Events that match the sourcetype get the sourcetype lookup

Events that match both only get the host lookup

It is a proven practice recommendation to make sure that all of your lookup stanzas have unique names.

Command roundup

This section lists several important Splunk commands you will use when working with lookups.

The lookup command

The Splunk lookup command is used to manually invoke field lookups using a Splunk lookup table that is previously defined. You can use Splunk Web (or the transforms.conf file) to define your lookups.

If you do not specify OUTPUT or OUTPUTNEW, all fields in the lookup table (excluding the lookup match field) will be used by Splunk as output fields. Conversely, if OUTPUT is specified, the output lookup fields will overwrite existing fields and if OUTPUTNEW is specified, the lookup will not be performed for events in which the output fields already exist.

For example, if you have a lookup table specified as iptousername with (at least) two fields, IP and UserId, for each event, Splunk will look up the value of the field IP in the table and for any entries that match, the value of the UserId field in the lookup table will be written to the field user_name in the event. The query is as follows:

... Lookup iptousernameIP as "IP" output UserId as user_name

Always strive to perform lookups after any reporting commands in your search pipeline, so that the lookup only needs to match the results of the reporting command and not every individual event.

The inputlookup and outputlookup commands

The inputlookup command allows you to load search results from a specified static lookup table. It reads in a specified CSV filename (or a table name as specified by the stanza name in transforms.conf). If the append=t (that is, true) command is added, the data from the lookup file is appended to the current set of results (instead of replacing it). The outputlookup command then lets us write the results' events to a specified static lookup table (as long as this output lookup table is defined).

So, here is an example of reading in the MasterDHCP lookup table (as specified in transforms.conf) and writing these event results to the lookup table definition NewMasterDHCP:

| inputlookup MasterDHCP | outputlookup NewMasterDHCP

After running the preceding command, we can see the following output:

lookups-img-37

Note that we can add the append=t command to the search in the following fashion:

| inputlookup MasterDHCP.csv | inputlookup NewMasterDHCP.csv append=t |

The inputcsv and outputcsv commands

The inputcsv command is similar to the inputlookup command; in this, it loads search results, but this command loads from a specified CSV file. The filename must refer to a relative path in $SPLUNK_HOME/var/run/splunk and if the specified file does not exist and the filename did not have an extension, then a filename with a .csv extension is assumed. The outputcsv command lets us write our result events to a CSV file.

Here is an example where we read in a CSV file named splunk_master.csv, search for the text phrase FPM, and then write any matching events to a CSV file named FPMBU.csv:

| inputcsv splunk_master.csv | search "Business Unit Name"="FPM" | outputcsv FPMBU.csv

The following screenshot shows the results from the preceding search command:

lookups-img-38

The following screenshot shows the resulting file generated as a result of the preceding command:

lookups-img-39

Here is another example where we read in the same CSV file (splunk_master.csv) and write out only events from 51 to 500:

| inputcsv splunk_master start=50 max=500

Events are numbered starting with zero as the first entry (rather than 1).

Summary

In this article, we defined Splunk lookups and discussed their value. We also went through the two types of lookups, static and dynamic, and saw detailed, working examples of each. Various Splunk commands typically used with the lookup functionality were also presented.