Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

A ride through world's best ETL tool – Informatica PowerCenter

Save for later
  • 25 min read
  • 30 Dec 2014

article-image

In this article, by Rahul Malewar, author of the book, Learning Informatica PowerCenter 9.x, we will go through the basics of Informatica PowerCenter. Informatica Corporation (Informatica), a multi-million dollar company incorporated in February 1993, is an independent provider of enterprise data integration and data quality software and services. The company enables a variety of complex enterprise data integration products, which include PowerCenter, Power Exchange, enterprise data integration, data quality, master data management, business to business (B2B) data exchange, application information lifecycle management, complex event processing, ultra messaging, and cloud data integration.

Informatica PowerCenter is the most widely used tool of Informatica across the globe for various data integration processes. Informatica PowerCenter tool helps integration of data from almost any business system in almost any format. This flexibility of PowerCenter to handle almost any data makes it most widely used tool in the data integration world.

(For more resources related to this topic, see here.)

Informatica PowerCenter architecture

PowerCenter has a service-oriented architecture that provides the ability to scale services and share resources across multiple machines. This lets you access the single licensed software installed on a remote machine via multiple machines. High availability functionality helps minimize service downtime due to unexpected failures or scheduled maintenance in the PowerCenter environment. Informatica architecture is divided into two sections: server and client.

ride-through-worlds-best-etl-tool-informatica-powercenter-img-0

Server is the basic administrative unit of Informatica where we configure all services, create users, and assign authentication. Repository, nodes, Integration Service, and code page are some of the important services we configure while we work on the server side of Informatica PowerCenter. Client is the graphical interface provided to the users. Client includes PowerCenter Designer, PowerCenter Workflow Manager, PowerCenter Workflow Monitor, and PowerCenter Repository Manager.

The best place to download the Informatica software for training purpose is from EDelivery (www.edelivery.com) website of Oracle. Once you download the files, start the extraction of the zipped files. After you finish extraction, install the server first and later client part of PowerCenter.

For installation of Informatica PowerCenter, the minimum requirement is to have a database installed in your machine. Because Informatica uses the space from the Oracle database to store the system-related information and the metadata of the code, which you develop in client tool.

Informatica PowerCenter client tools

Informatica PowerCenter Designer client tool talks about working of the source files and source tables and similarly talks about working on targets. Designer tool allows import/create flat files and relational databases tables. Informatica PowerCenter allows you to work on both types of flat files, that is, delimited and fixed width files. In delimited files, the values are separated from each other by a delimiter. Any character or number can be used as delimiter but usually for better interpretation we use special characters as delimiter. In delimited files, the width of each field is not a mandatory option as each value gets separated by other using a delimiter. In fixed width files, the width of each field is fixed. The values are separated by each other by the fixed size of the column defined. There can be issues in extracting the data if the size of each column is not maintained properly. PowerCenter Designer tool allows you to create mappings using sources, targets, and transformations. Mappings contain source, target, and transformations linked to each other through links. The group of transformations which can be reused is called as mapplets. Mapplets are another important aspect of Informatica tool. The transformations are most important aspect of Informatica, which allows you to manipulate the data based on your requirements. There are various types of transformations available in Informatica PowerCenter. Every transformation performs specific functionality.

ride-through-worlds-best-etl-tool-informatica-powercenter-img-1

Various transformations in Informatica PowerCenter

The following are the various transformations in Informatica PowerCenter:

  • Expression transformation is used for row-wise manipulation. For any type of manipulation you wish to do on an individual record, use Expression transformation. Expression transformation accepts the row-wise data, manipulates it, and passes to the target. The transformation receives the data from input port and it sends the data out from output ports. Use the Expression transformation for any row-wise calculation, like if you want to concatenate the names, get total salary, and convert in upper case.
  • Aggregator transformation is used for calculations using aggregate functions on a column as against in the Expression transformation, which is used for row-wise manipulation. You can use aggregate functions, such as SUM, AVG, MAX, MIN, and so on in Aggregator transformation. When you use Aggregator transformation, Integration Services stores the data temporarily in cache memory. Cache memory is created because the data flows in row-wise manner in Informatica and the calculations required in Aggregator transformation is column wise. Unless we store the data temporarily in cache, we cannot perform the aggregate calculations to get the result. Using Group By option in Aggregator transformation, you can get the result of the Aggregate function based on group. Also it is always recommended that we pass sorted input to Aggregator transformation as this will enhance the performance. When you pass the sorted input to Aggregator transformation, Integration Services enhances the performance by storing less data into cache. When you pass unsorted data, Aggregator transformation stores all the data into cache which takes more time. When you pass the sorted data to Aggregator transformation, Aggregator transformation stores comparatively lesser data in the cache. Aggregator passes the result of each group as soon the data for particular group is received. Note that Aggregator transformation does not sort the data. If you have unsorted data, use Sorter transformation to sort the data and then pass the sorted data to Aggregator transformation.
  • Sorter transformation is used to sort the data in ascending or descending order based on single or multiple key. Apart from ordering the data in ascending or descending order, you can also use Sorter transformation to remove duplicates from the data using the distinct option in the properties. Sorter can remove duplicates only if complete record is duplicate and not only particular column.
  • Filter transformation is used to remove unwanted records from the mapping. You define the Filter condition in the Filter transformation. Based on filter condition, the records will be rejected or passed further in mapping. The default condition in Filter transformation is TRUE. Based on the condition defined, if the record returns True, the Filter transformation allows the record to pass. For each record which returns False, the Filter transformation drops those records. It is always recommended to use Filter transformation as early as possible in the mapping for better performance.
  • Router transformation is single input group multiple output group transformation. Router can be used in place of multiple Filter transformations. Router transformation accepts the data once through input group and based on the output groups you define, it sends the data to multiple output ports. You need to define the filter condition in each output group. It is always recommended to use Router in place of multiple filters in the mapping to enhance the performance.
  • Rank transformation is used to get top or bottom specific number of records based on the key. When you create a Rank transformation, a default output port RANKINDEX comes with the transformation. It is not mandatory to use the RANKINDEX port.
  • Sequence Generator transformation is used to generate sequence of unique numbers. Based on the property defined in the Sequence Generator transformation, the unique values are generated. You need to define the start value, the increment by value, and the end value in the properties. Sequence Generator transformation has only two ports: NEXTVAL and CURRVAL. Both the ports are output port. Sequence Generator does not have any input port. You cannot add or delete any port in Sequence Generator. It is recommended that you should always use the NEXTVAL port first. If the NEXTVAL port is utilized, use the CURRVAL port. You can define the value of CURRVAL in the properties of Sequence Generator transformation.
  • Joiner transformation is used to join two heterogeneous sources. You can join data from same source type also. The basic criteria for joining the data are a matching column in both the source. Joiner transformation has two pipelines, one is called mater and other is called as detail. We do not have left or right join as we have in SQL database. It is always recommended to make table with lesser number of record as master and other one as details. This is because Integration Service picks the data from master source and scans the corresponding record in details table. So if we have lesser number of records in master table, lesser number of times the scanning will happen. This enhances the performance. Joiner transformation has four types of joins: normal join, full outer join, master outer join, details outer join.
  • Union transformation is used the merge the data from multiple sources. Union is multiple input single output transformation. This is opposite of Router transformation, which we discussed earlier. The basic criterion for using Union transformation is that you should have data with matching data type. If you do not have data with matching data type coming from multiple sources, Union transformation will not work. Union transformation merges the data coming from multiple sources and do not remove duplicates, that is, it acts as UNION ALL of SQL statements. As mentioned earlier, Union requires data coming from multiple sources. Union reads the data concurrently from multiple sources and processes the data. You can use heterogeneous sources to merge the data using Union transformation.
  • Source Qualifier transformation acts as virtual source in Informatica. When you drag relational table or flat file in Mapping Designer, Source Qualifier transformation comes along. Source Qualifier is the point where actually Informatica processing starts. The extraction process starts from the Source Qualifier.

Lookup transformation is used to lookup of source, Source Qualifier, or target to get the relevant data. You can look up on flat file or relational tables. Lookup transformation works on the similar lines as Joiner with few differences like Lookup does not require two source. Lookup transformations can be connected and unconnected. Lookup transformation extracts the data from the lookup table or file based on the lookup condition. When you create the Lookup transformation you can configure the Lookup transformation to cache the data. Caching the data makes the processing faster since the data is stored internally after cache is created. Once you select to cache the data, Lookup transformation caches the data from the file or table once and then based on the condition defined, lookup sends the output value. Since the data gets stored internally, the processing becomes faster as it does not require checking the lookup condition in file or database. Integration Services queries the cache memory as against checking the file or table for fetching the required data. The cache is created automatically and also it is deleted automatically once the processing is complete. Lookup transformation has four different types of ports. Input ports (I) receive the data from other transformation. This port will be used in Lookup condition. You need to have at least one input port. Output port (O) passes the data out of the Lookup transformation to other transformations. Lookup port (L) is the port for which you wish to bring the data in mapping. Each column is assigned as lookup and output port when you create the Lookup transformation. If you delete the lookup port from the flat file lookup source, the session will fail. If you delete the lookup port from relational lookup table, Integration Services extracts the data only with Lookup port. This helps in reducing the data extracted from the lookup source. Return port (R) is only used in case of unconnected Lookup transformation. This port indicates which data you wish to return in the Lookup transformation. You can define only one port as return port. Return port is not used in case on connected Lookup transformation.

Cache is the temporary memory, which is created when you execute the process. Cache is created automatically when the process starts and is deleted automatically once the process is complete. The amount of cache memory is decided based on the property you define in the transformation level or session level. You usually set the property as default, so as required it can increase the size of the cache. If the size required for caching the data is more than the cache size defined, the process fails with the overflow error. There are different types of caches available for lookup transformation. You can define the session property to create the cache either sequentially or concurrently. When you select to create the cache sequentially, Integration Service caches the data in row-wise manner as the records enters the Lookup transformation. When the first record enters the Lookup transformation, lookup cache gets created and stores the matching record from the lookup table or file in the cache. This way the cache stores only matching data. It helps in saving the cache space by not storing the unnecessary data. When you select to create cache concurrently, Integration Service does not wait for the data to flow from the source, but it first caches complete data. Once the caching is complete, it allows the data to flow from the source. When you select concurrent cache, the performance enhances as compared to sequential cache, since the scanning happens internally using the data stored in cache. You can configure the cache to permanently save the data. By default, the cache is created as non-persistent, that is, the cache will be deleted once the session run is complete. If the lookup table or file does not change across the session runs, you can use the existing persistent cache. A cache is said to be static if it does not change with the changes happening in the lookup table. The static cache is not synchronized with the lookup table. By default Integration Service creates a static cache. Lookup cache is created as soon as the first record enters the Lookup transformation. Integration Service does not update the cache while it is processing the data. A cache is said to be dynamic if it changes with the changes happening in the lookup table. The static cache is synchronized with the lookup table. You can choose from the Lookup transformation properties to make the cache as dynamic. Lookup cache is created as soon as the first record enters the lookup transformation. Integration Service keeps on updating the cache while it is processing the data. The Integration Service marks the record as insert for new row inserted in dynamic cache. For the record which is updated, it marks the record as update in the cache. For every record which no change, the Integration Service marks it as unchanged.

Update Strategy transformation is used to INSERT, UPDATE, DELETE, or REJECT record based on defined condition in the mapping. Update Strategy transformation is mostly used when you design mappings for SCD. When you implement SCD, you actually decide how you wish to maintain historical data with the current data. When you wish to maintain no history, complete history, or partial history, you can either use property defined in the session task or you use Update Strategy transformation. When you use Session task, you instruct the Integration Service to treat all records in the same way, that is, either insert, update or delete. When you use Update Strategy transformation in the mapping, the control is no more with the session task. Update Strategy transformation allows you to insert, update, delete or reject record based on the requirement. When you use Update Strategy transformation, the control is no more with session task. You need to define the following functions to perform the corresponding operation:

  • DD_INSERT: This can be used when you wish to insert the records. It is also represented by numeric 0.
  • DD_UPDATE: This can be used when you wish to update the records. It is also represented by numeric 1.
  • DD_DELETE: This can be used when you wish to delete the records. It is also represented by numeric 2.
  • DD_REJECT: This can be used when you wish to reject the records. It is also represented by numeric 3.

Normalizer transformation is used in place of Source Qualifier transformation when you wish to read the data from Cobol Copybook source. Also, the Normalizer transformation is used to convert column-wise data to row-wise data. This is similar to transpose feature of MS Excel. You can use this feature if your source is Cobol Copybook file or relational database tables. Normalizer transformation converts column to row and also generate index for each converted row.

Stored procedure is a database component. Informatica uses the stored procedure similar to database tables. Stored procedures are set of SQL instructions, which require certain set of input values and in return stored procedure returns output value. The way you either import or create database tables, you can import or create the stored procedure in mapping. To use the Stored Procedure in mapping the stored procedure should exist in the database. Similar to Lookup transformation, stored procedure can also be connected or unconnected transformation in Informatica. When you use connected stored procedure, you pass the value to stored procedure through links. When you use unconnected stored procedure, you pass the value using :SP function.

Transaction Control transformation allows you to commit or rollback individual records, based on certain condition. By default, Integration Service commits the data based on the properties you define at the session task level. Using the commit interval property Integration Service commits or rollback the data into target. Suppose you define commit interval as 10,000, Integration Service will commit the data after every 10,000 records. When you use Transaction Control transformation, you get the control at each record to commit or rollback. When you use Transaction Control transformation, you need to define the condition in expression editor of the Transaction Control transformation. When you run the process, the data enters the Transaction Control transformation in row-wise manner. The Transaction Control transformation evaluates each row, based on which it commits or rollback the data.

Classification of Transformations

The transformations, which we discussed are classified into two categories—active/passive and connected/unconnected. Active/Passive classification of transformations is based on the number of records at the input and output port of the transformation. If the transformation does not change the number of records at its input and output port, it is said to be passive transformation. If the transformation changes the number of records at the input and output port of the transformation, it is said to be active transformation. Also if the transformation changes the sequence of records passing through it, it will be an active transformation as in case of Union transformation. A transformation is said to be connected if it is connected to any source or any target or any other transformation by at least a link. If the transformation is not connected by any link is it classed as unconnected. Only Lookup and stored procedure transformations can be connected and unconnected, rest all transformations are connected.

Advanced Features of designer screen

Talking about the advanced features of PowerCenter Designer tool, debugger helps you to debug the mappings to find the error in your code. Informatica PowerCenter provides a utility called as debugger to debug the mapping so that you can easily find the issue in the mapping which you created. Using the debugger, you can see the flow of every record across the transformations.

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $15.99/month. Cancel anytime

Another feature is target load plan, a functionality which allows you to load data in multiple targets in a same mapping maintaining their constraints. The reusable transformations are transformations which allow you to reuse the transformations across multiple mapping. As source and target are reusable components, transformations can also be reused.

When you work on any technology, it is always advised that your code should be dynamic. This means you should use the hard coded values as less as possible in your code. It is always recommended that you use the parameters or the variable in your code so you can easily pass these values and need not frequently change the code. This functionality is achieved by using parameter file in Informatica. The value of a variable can change between the session run. The value of parameter will remain constant across the session runs. The difference is very minute so you should define parameter or variable properly as per your requirements.

Informatica PowerCenter allows you to compare objects present within repository. You can compare sources, targets, transformations, mapplets, and mappings in PowerCenter Designer under Source Analyzer, Target Designer, Transformation Developer, Mapplet Designer, Mapping Designer respectively. You can compare the objects in the same repository or in multiple repositories.

Tracing level in Informatica defines the amount of data you wish to write in the session log when you execute the workflow. Tracing level is a very important aspect in Informatica as it helps in analyzing the error. Tracing level is very helpful in finding the bugs in the process. You can define tracing level in every transformation. Tracing level option is present in every transformation properties window. There are four types of tracing level available:

  • Normal: When you set the tracing level as normal, Informatica stores status information, information about errors, and information about skipped rows. You get detailed information but not at individual row level.
  • Terse: When you set the tracing level as terse, Informatica stores error information and information of rejected records. Terse tracing level occupies lesser space as compared to normal.
  • Verbose initialization: When you set the tracing level as verbose initialize, it stores process details related to startup, details about index and data files created and more details of transformation process in addition to details stored in normal tracing. This tracing level takes more space as compared to normal and terse.
  • Verbose data: This is the most detailed level of tracing level. It occupies more space and takes longer time as compared to other three. It stores row level data in the session log. It writes the truncation information whenever it truncates the data. It also writes the data to error log if you enable row error logging.

Default tracing level is normal. You can change the tracing level to terse to enhance the performance. Tracing level can be defined at individual transformation level or you can override the tracing level by defining it at session level.

Informatica PowerCenter Workflow Manager

Workflow Manager screen is the second and last phase of our development work. In the Workflow Manager session task and workflows are created, which is used to execute mapping. Workflow Manager screen allows you to work on various connections like relations, FTP, and so on. Basically, Workflow Manager contains set of instructions which we define as workflow. The basic building block of workflow is tasks. As we have multiple transformations in designer screen, we have multiple tasks in Workflow Manager Screen. When you create a workflow, you add tasks into it as per your requirement and execute the workflow to see the status in the monitor.

ride-through-worlds-best-etl-tool-informatica-powercenter-img-2

Workflow is a combination of multiple tasks connected with links that trigger in proper sequence to execute a process. Every workflow contains start task along with other tasks. When you execute the workflow, you actually trigger start task, which in turn triggers other tasks connected in the flow. Every task performs a specific functionality. You need to use the task based on the functionality you need to achieve.

Various tasks in Workflow Manager

The following are the tasks in Workflow Manager:

  • Session task is used to execute the mapping. Every session task can execute a single mapping. You need to define the path/connection of the source and target used in the mapping, so the session can extract the data from the defined path and send the data to the mapping for processing. Email task is used to send success or failure email notifications. You can configure your outlook or mailbox with the email task to directly send the notification.
  • Command task is used to execute Unix scripts/commands or Windows commands.
  • Timer task is used to add some time gap or to add delay between two tasks. Timer task have properties related to absolute time and relative time.
  • Assignment task is used to assign a value to workflow variable.
  • Control task is used to control the flow of workflow by stopping or aborting the workflow in case on some error. You can control the flow of complete workflow using control task.
  • Decision task is used to check the status of multiple tasks and hence control the execution of workflow.
  • Link task as against decision task can only check the status of the previous task. Event task is used to wait for a particular event to occur. Usually it is used as file watcher task. Using event wait task we can keep looking for a particular file and then trigger the next task.
  • Evert raise task is used to trigger a particular event defined in workflow.

Advanced Workflow Manager

Workflow Manager screen has some very important features called as scheduling and incremental aggregation, which allows in easier and convenient processing of data. Scheduling allows you to schedule the workflow as specified timing so the workflow runs at the desired time. You need not manually run the workflow every time, schedule can do the needful. Incremental aggregation and partitioning are advanced features, which allows you to process the data faster.

When you run the workflow, Integration Service extracts the data in row wise manner from the source path/connection you defined in session task and makes it flow from the mapping. The data reaches the target through the transformations you defined in the mapping. The data always flow in a row wise manner in Informatica, no matter what so ever is your calculation or manipulation. So if you have 10 records in source, there will be 10 Source to target flows while the process is executed.

Informatica PowerCenter Workflow Monitor

The Workflow Monitor screen allows the monitoring of the workflows execute in Workflow Manager. Workflow Monitor screen allows you check the status and log files for the Workflow. Using the logs generated, you can easily find the error and rectify the error. Workflow Manager also shows the statistics for number of records extracted from source and number of records loaded into target. Also it gives statistics of error records and bad records.

Informatica PowerCenter Repository Manager

Repository Manager screen is the fourth client screen, which is basically used for migration (deployment) purpose. This screen is also used for some administration related activities like configuring server with client and creating users.

Performance Tuning in Informatica PowerCenter

The performance tuning has the contents for the optimizations of various components of Informatica PowerCenter tool, such as source, targets, mappings, sessions, systems. Performance tuning at high level involves two stages, finding the issues called as bottleneck and resolving them. Informatica PowerCenter has features like pushdown optimization and partitioning for better performance. With the defined steps and using the best practices for coding the performance can be enhanced drastically.

Slowly Changing Dimensions

Using all the understanding of the different client tools you can implement the Data warehousing concepts called as SCD, slowly changing dimensions. Informatica PowerCenter provides wizards, which allow you to easily create different types of SCDs, that is, SCD1, SCD2, and SCD3.

  • Type 1 Dimension mapping (SCD1): It keeps only current data and do not maintain historical data.
  • Type 2 Dimension/Version Number mapping (SCD2): It keeps current as well as historical data in the table. SCD2 allows you to insert new record and changed records using a new column (PM_VERSION_NUMBER) by maintaining the version number in the table to track the changes. We use a new column PM_PRIMARYKEY to maintain the history.
  • Type 2 Dimension/Flag mapping: It keeps current as well as historical data in the table. SCD2 allows you to insert new record and changed records using a new column (PM_CURRENT_FLAG) by maintaining the flag in the table to track the changes. We use a new column PRIMARY_KEY to maintain the history.
  • Type 2 Dimension/Effective Date Range mapping: It keeps current as well as historical data in the table. SCD2 allows you to insert new record and changed records using two new columns (PM_BEGIN_DATE and PM_END_DATE) by maintaining the date range in the table to track the changes. We use a new column PRIMARY_KEY to maintain the history.
  • Type 3 Dimension mapping: It keeps current as well as historical data in the table. We maintain only partial history by adding new column.

Summary

With this we have discussed the complete PowerCenter tool in brief. The PowerCenter is the best fit tool for any size and any type of data, which you wish to handle. It also provides compatibility with all the files and databases for processing purpose. The transformations present allow you to manipulate any type of data in any form you wish. The advanced features make your work simple by providing convenient options. The PowerCenter tool can make your life easy and can offer you some great career path if you learn it properly as Informatica PowerCenter tool have huge demand in job market and it is one of the highly paid technologies in IT market. Just grab a book and start walking the path. The end will be a great career. We are always available for help. For any help in installation or any issues related to PowerCenter you can reach me at [email protected].

Resources for Article:

 Further resources on this subject: