Designing, previewing, and running transformations
In this section, we will create a transformation that is a bit more interesting than the one you already built. In doing this, you will have a chance to learn about the process of designing transformations, while also previewing your work.
The task is as follows: you will be given a file with a list of cities in the USA, along with their zip codes and their state names. You will have to generate a file containing only the cities in the state of NY, sorted by zip code. We will split the task into the following steps:
- Designing and previewing the transformation
- Learning to deal with errors that may appear
- Saving and running the transformation
Designing and previewing a transformation
Let's start by developing the first part of the transformation. We will read the file and filter the data. In this case, the solution is quite straightforward (this will not always be the case). There is a PDI step for each of the tasks to accomplish. The CSV file input
step will serve for reading the file, and the Filter rows
step will filter the rows. The instructions are as follows:
- First, create a transformation. You can do this from the main menu, from the main toolbar, or by pressing Ctrl + T.
- From the
Input
folder that contains steps, drag and drop aCSV file input
step to the work area. - Double-click on the step. A configuration window will show up.
- Click on the
Browse...
button to locate the file. For this exercise, we will use a file that comes with the PDI software. You will find it in the following path, under the installation folder:samples\transformations\files\Zipssortedbycitystate.csv
. - Click on
Get Fields
. The grid will be filled with the columns found in the file:

Configuring a CSV file input step
- Click on
Preview
, then click onOK
. A window with sample data will appear, as shown in the following screenshot:

Sample data
- Click on
Close
to close theExamine preview data
window, and then click onOK
to close the configuration window.
Now that we have read the file, the data is available for further processing. The rows coming from the CSV file input
step will flow towards the next step, which will be the filter:
- From the
Flow
folder, drag and drop aFilter rows
step. - Click on the output connector in the
CSV file input
step to create a hop towards theFilter rows
step. - You will be prompted for the kind of hop. Select
Main output of step
, shown as follows:

Selecting a kind of hop
- Double-click on the
Filter rows
step to configure the filter. - Fill in the configuration window, as shown in the following screenshot, to indicate that we will only keep rows with states equal to
NY
:

Configuring a filter
- Close the window. The following is what you should have so far:

Simple transformation
Now, we will preview the results to see if we get what we expected:
- Make sure that the
Filter rows
step is selected.
Note
When a step is selected, its border becomes wider, as shown in the previous screenshot.
- Press F10 to preview the results. Alternatively, click on the Preview icon (the icon that looks like an eye) in the transformation toolbar. Then, click on
Quick Launch
. A window with the filtered rows will appear, as follows:

Previewing data
Note
By default, only 1,000 rows are previewed. If you want to look at more data, just click on Get more rows
.
- Click on
Stop
to stop the previewing process and close the window.
Note
As you can see in the preceding image, when you preview or run a transformation, a small window with metrics is displayed above the steps while the rows are being processed. These metrics are the same as those shown in the Steps metrics
tab in the Execution Results
window.
Understanding the logging options
PDI logs all of the executions of a transformation. By default, the level of the logging details is basic, but there are seven possible levels of logging, ranging from Nothing at all
to Rowlevel (very detailed)
, which is the most detailed level of logging. You can change the level of logging as follows:
- If you will run a transformation, in the
Execute a transformation
window, before clicking onRun
, select the proper option:

Selecting the Log level
- If you are previewing a transformation, instead of clicking on
Quick Launch
, selectConfigure
. This will show you theExecute a transformation
window. In this window, chooseLog level
, and then click onRun
.
Understanding the Step Metrics tab
Before continuing, let's observe what is happening in the Execution Results
window. You already know the Logging
tab, which displays every task that you are performing. Now, click on the Step Metrics
tab. You will see the following:

Step Metrics tab
In this tab, there is a grid with one row for each of the steps in the transformation. In this case, we have two of them: one for the CSV file input
step, and one for the Filter rows
step. The columns in the grid describe what happened in each step. The following are the most relevant columns in our example:
Read
: The number of rows coming from the previous stepWritten
: The number of rows that leave the current step toward the next oneInput
: The number of rows coming from external sources
For instance, the rows that the CSV file input
step reads from the file travel toward the Filter rows
step. In other words, the output of the CSV file input
step, displayed under the Written
column, is the input of the Filter rows
step, displayed under the Read
column.
Also, if you look at the Filter rows
line in the Steps Metrics
grid, the number under the Written
column represents the number of rows that will leave the step (that is, the rows remaining after filtering).
CSV file input
is the only step that gets data from an external source – a file. Therefore, this is the only step that has a value greater than zero in the Input
column.
The last columns in the grid – Time
, Speed (r/s)
, and input/output
– are metrics to monitor the performance of the execution. As to the rest of the columns in the grid, they will be described in later chapters.
Dealing with errors while designing
Now, we will continue working on the transformation created in the previous section. This time, we will sort the final data by ZIP code. This is a very simple task, but we will use it as a method to learn how to deal with errors that may appear while we are designing:
- From the
Transform
folder that contains steps, drag and drop aSort rows
step to the work area. - Create a hop from the
Filter rows
step to this one. Again, you will be prompted for the kind of hop. Select theMain output of step
option:

Kinds of hop leaving a Filter rows step
- Double-click on the Sort rows icon. Fill in the grid as follows:

Sorting data
- Close the window.
- Make sure that the Sort rows step is selected, and run a preview like you did before. If you followed the steps as explained, you will get an error.
There are several indications that will help you to understand that an error occurred:
- A small red icon will appear in the upper-right corner of one or more steps. These are the steps that are causing the error.
- The backgrounds of the corresponding rows in the
Step Metrics
tab will change to red:

Errors in the Step Metrics tab
- The
Logging
tab will contain text explaining the error:

Errors in the Logging tab
In this case, as stated in the log, the problem was that we were referring to a field that doesn't exist. We typed ZIPCODE
instead of POSTALCODE
. Let's fix it, as follows:
- Double-click on the
Sort rows
step and fix the name of the field - Close the window and run a preview again
- You will see the rows with states equal to NY, sorted by ZIP code
Saving and running a transformation
The last task before saving and running the transformation is to send the results to a file. This is quite easy:
- From the
Output
folder, drag and drop aText file output
step to the work area. Create a hop from theSort rows
step to this new step. Note that this time, you don't have to choose the kind of hop; a default kind of hop will be created. - Double-click on the
Text file output
step. In the configuration window, provide a name for the file that we will generate. You should specify the full path, for instance,C:/Pentaho/data/ny_cities
.
Note
You don't have to type the extension; it is automatically added, as indicated in the extension
textbox.
- Close the window.
The transformation is complete. The only task to perform now is to save it and run it, as follows:
- Save the transformation. You can do so by pressing Ctrl + S or by selecting the proper option from
Main Menu
orMain Toolbar
.
- Once the transformation has saved, you can run it. Do so by pressing F9. In the
Logging
tab of theExecution Results
window, you will see the log of the execution. If you select thePreview data
tab in the same window, you will see sample data coming from the step currently selected. As an example, click on theFilter rows
step and look at the data in thePreview data
tab. You will see all of the rows for the state of NY, although they are still out of order:

Preview data tab
- If you click on the Sort rows step, you will see the same, but ordered. Also, a file should have been created with the same information. Browse your system to look for the generated file. Its content should be something like the following:
CITY;STATE;POSTALCODE NEW YORK;NY;10001 NEW YORK;NY;10003 NEW YORK;NY;10005 ... ... ELMIRA;NY;14925 HOLTSVILLE;NY;501 FISHERS ISLAND;NY;6390
Note
If you look at the sample lines, you will note that the code 501
is between 14925
and 6390
. The codes are not sorted by number, but alphabetically. This is because the ZIP code was defined as a String
in the input step.