





















































In this article, by Rahul Malewar, author of the book, Learning Informatica PowerCenter 9.x, we will go through the basics of Informatica PowerCenter. Informatica Corporation (Informatica), a multi-million dollar company incorporated in February 1993, is an independent provider of enterprise data integration and data quality software and services. The company enables a variety of complex enterprise data integration products, which include PowerCenter, Power Exchange, enterprise data integration, data quality, master data management, business to business (B2B) data exchange, application information lifecycle management, complex event processing, ultra messaging, and cloud data integration.
Informatica PowerCenter is the most widely used tool of Informatica across the globe for various data integration processes. Informatica PowerCenter tool helps integration of data from almost any business system in almost any format. This flexibility of PowerCenter to handle almost any data makes it most widely used tool in the data integration world.
(For more resources related to this topic, see here.)
PowerCenter has a service-oriented architecture that provides the ability to scale services and share resources across multiple machines. This lets you access the single licensed software installed on a remote machine via multiple machines. High availability functionality helps minimize service downtime due to unexpected failures or scheduled maintenance in the PowerCenter environment. Informatica architecture is divided into two sections: server and client.
Server is the basic administrative unit of Informatica where we configure all services, create users, and assign authentication. Repository, nodes, Integration Service, and code page are some of the important services we configure while we work on the server side of Informatica PowerCenter. Client is the graphical interface provided to the users. Client includes PowerCenter Designer, PowerCenter Workflow Manager, PowerCenter Workflow Monitor, and PowerCenter Repository Manager.
The best place to download the Informatica software for training purpose is from EDelivery (www.edelivery.com) website of Oracle. Once you download the files, start the extraction of the zipped files. After you finish extraction, install the server first and later client part of PowerCenter.
For installation of Informatica PowerCenter, the minimum requirement is to have a database installed in your machine. Because Informatica uses the space from the Oracle database to store the system-related information and the metadata of the code, which you develop in client tool.
Informatica PowerCenter Designer client tool talks about working of the source files and source tables and similarly talks about working on targets. Designer tool allows import/create flat files and relational databases tables. Informatica PowerCenter allows you to work on both types of flat files, that is, delimited and fixed width files. In delimited files, the values are separated from each other by a delimiter. Any character or number can be used as delimiter but usually for better interpretation we use special characters as delimiter. In delimited files, the width of each field is not a mandatory option as each value gets separated by other using a delimiter. In fixed width files, the width of each field is fixed. The values are separated by each other by the fixed size of the column defined. There can be issues in extracting the data if the size of each column is not maintained properly. PowerCenter Designer tool allows you to create mappings using sources, targets, and transformations. Mappings contain source, target, and transformations linked to each other through links. The group of transformations which can be reused is called as mapplets. Mapplets are another important aspect of Informatica tool. The transformations are most important aspect of Informatica, which allows you to manipulate the data based on your requirements. There are various types of transformations available in Informatica PowerCenter. Every transformation performs specific functionality.
The following are the various transformations in Informatica PowerCenter:
Lookup transformation is used to lookup of source, Source Qualifier, or target to get the relevant data. You can look up on flat file or relational tables. Lookup transformation works on the similar lines as Joiner with few differences like Lookup does not require two source. Lookup transformations can be connected and unconnected. Lookup transformation extracts the data from the lookup table or file based on the lookup condition. When you create the Lookup transformation you can configure the Lookup transformation to cache the data. Caching the data makes the processing faster since the data is stored internally after cache is created. Once you select to cache the data, Lookup transformation caches the data from the file or table once and then based on the condition defined, lookup sends the output value. Since the data gets stored internally, the processing becomes faster as it does not require checking the lookup condition in file or database. Integration Services queries the cache memory as against checking the file or table for fetching the required data. The cache is created automatically and also it is deleted automatically once the processing is complete. Lookup transformation has four different types of ports. Input ports (I) receive the data from other transformation. This port will be used in Lookup condition. You need to have at least one input port. Output port (O) passes the data out of the Lookup transformation to other transformations. Lookup port (L) is the port for which you wish to bring the data in mapping. Each column is assigned as lookup and output port when you create the Lookup transformation. If you delete the lookup port from the flat file lookup source, the session will fail. If you delete the lookup port from relational lookup table, Integration Services extracts the data only with Lookup port. This helps in reducing the data extracted from the lookup source. Return port (R) is only used in case of unconnected Lookup transformation. This port indicates which data you wish to return in the Lookup transformation. You can define only one port as return port. Return port is not used in case on connected Lookup transformation.
Cache is the temporary memory, which is created when you execute the process. Cache is created automatically when the process starts and is deleted automatically once the process is complete. The amount of cache memory is decided based on the property you define in the transformation level or session level. You usually set the property as default, so as required it can increase the size of the cache. If the size required for caching the data is more than the cache size defined, the process fails with the overflow error. There are different types of caches available for lookup transformation. You can define the session property to create the cache either sequentially or concurrently. When you select to create the cache sequentially, Integration Service caches the data in row-wise manner as the records enters the Lookup transformation. When the first record enters the Lookup transformation, lookup cache gets created and stores the matching record from the lookup table or file in the cache. This way the cache stores only matching data. It helps in saving the cache space by not storing the unnecessary data. When you select to create cache concurrently, Integration Service does not wait for the data to flow from the source, but it first caches complete data. Once the caching is complete, it allows the data to flow from the source. When you select concurrent cache, the performance enhances as compared to sequential cache, since the scanning happens internally using the data stored in cache. You can configure the cache to permanently save the data. By default, the cache is created as non-persistent, that is, the cache will be deleted once the session run is complete. If the lookup table or file does not change across the session runs, you can use the existing persistent cache. A cache is said to be static if it does not change with the changes happening in the lookup table. The static cache is not synchronized with the lookup table. By default Integration Service creates a static cache. Lookup cache is created as soon as the first record enters the Lookup transformation. Integration Service does not update the cache while it is processing the data. A cache is said to be dynamic if it changes with the changes happening in the lookup table. The static cache is synchronized with the lookup table. You can choose from the Lookup transformation properties to make the cache as dynamic. Lookup cache is created as soon as the first record enters the lookup transformation. Integration Service keeps on updating the cache while it is processing the data. The Integration Service marks the record as insert for new row inserted in dynamic cache. For the record which is updated, it marks the record as update in the cache. For every record which no change, the Integration Service marks it as unchanged.
Update Strategy transformation is used to INSERT, UPDATE, DELETE, or REJECT record based on defined condition in the mapping. Update Strategy transformation is mostly used when you design mappings for SCD. When you implement SCD, you actually decide how you wish to maintain historical data with the current data. When you wish to maintain no history, complete history, or partial history, you can either use property defined in the session task or you use Update Strategy transformation. When you use Session task, you instruct the Integration Service to treat all records in the same way, that is, either insert, update or delete. When you use Update Strategy transformation in the mapping, the control is no more with the session task. Update Strategy transformation allows you to insert, update, delete or reject record based on the requirement. When you use Update Strategy transformation, the control is no more with session task. You need to define the following functions to perform the corresponding operation:
Normalizer transformation is used in place of Source Qualifier transformation when you wish to read the data from Cobol Copybook source. Also, the Normalizer transformation is used to convert column-wise data to row-wise data. This is similar to transpose feature of MS Excel. You can use this feature if your source is Cobol Copybook file or relational database tables. Normalizer transformation converts column to row and also generate index for each converted row.
Stored procedure is a database component. Informatica uses the stored procedure similar to database tables. Stored procedures are set of SQL instructions, which require certain set of input values and in return stored procedure returns output value. The way you either import or create database tables, you can import or create the stored procedure in mapping. To use the Stored Procedure in mapping the stored procedure should exist in the database. Similar to Lookup transformation, stored procedure can also be connected or unconnected transformation in Informatica. When you use connected stored procedure, you pass the value to stored procedure through links. When you use unconnected stored procedure, you pass the value using :SP function.
Transaction Control transformation allows you to commit or rollback individual records, based on certain condition. By default, Integration Service commits the data based on the properties you define at the session task level. Using the commit interval property Integration Service commits or rollback the data into target. Suppose you define commit interval as 10,000, Integration Service will commit the data after every 10,000 records. When you use Transaction Control transformation, you get the control at each record to commit or rollback. When you use Transaction Control transformation, you need to define the condition in expression editor of the Transaction Control transformation. When you run the process, the data enters the Transaction Control transformation in row-wise manner. The Transaction Control transformation evaluates each row, based on which it commits or rollback the data.
The transformations, which we discussed are classified into two categories—active/passive and connected/unconnected. Active/Passive classification of transformations is based on the number of records at the input and output port of the transformation. If the transformation does not change the number of records at its input and output port, it is said to be passive transformation. If the transformation changes the number of records at the input and output port of the transformation, it is said to be active transformation. Also if the transformation changes the sequence of records passing through it, it will be an active transformation as in case of Union transformation. A transformation is said to be connected if it is connected to any source or any target or any other transformation by at least a link. If the transformation is not connected by any link is it classed as unconnected. Only Lookup and stored procedure transformations can be connected and unconnected, rest all transformations are connected.
Talking about the advanced features of PowerCenter Designer tool, debugger helps you to debug the mappings to find the error in your code. Informatica PowerCenter provides a utility called as debugger to debug the mapping so that you can easily find the issue in the mapping which you created. Using the debugger, you can see the flow of every record across the transformations.
Another feature is target load plan, a functionality which allows you to load data in multiple targets in a same mapping maintaining their constraints. The reusable transformations are transformations which allow you to reuse the transformations across multiple mapping. As source and target are reusable components, transformations can also be reused.
When you work on any technology, it is always advised that your code should be dynamic. This means you should use the hard coded values as less as possible in your code. It is always recommended that you use the parameters or the variable in your code so you can easily pass these values and need not frequently change the code. This functionality is achieved by using parameter file in Informatica. The value of a variable can change between the session run. The value of parameter will remain constant across the session runs. The difference is very minute so you should define parameter or variable properly as per your requirements.
Informatica PowerCenter allows you to compare objects present within repository. You can compare sources, targets, transformations, mapplets, and mappings in PowerCenter Designer under Source Analyzer, Target Designer, Transformation Developer, Mapplet Designer, Mapping Designer respectively. You can compare the objects in the same repository or in multiple repositories.
Tracing level in Informatica defines the amount of data you wish to write in the session log when you execute the workflow. Tracing level is a very important aspect in Informatica as it helps in analyzing the error. Tracing level is very helpful in finding the bugs in the process. You can define tracing level in every transformation. Tracing level option is present in every transformation properties window. There are four types of tracing level available:
Default tracing level is normal. You can change the tracing level to terse to enhance the performance. Tracing level can be defined at individual transformation level or you can override the tracing level by defining it at session level.
Workflow Manager screen is the second and last phase of our development work. In the Workflow Manager session task and workflows are created, which is used to execute mapping. Workflow Manager screen allows you to work on various connections like relations, FTP, and so on. Basically, Workflow Manager contains set of instructions which we define as workflow. The basic building block of workflow is tasks. As we have multiple transformations in designer screen, we have multiple tasks in Workflow Manager Screen. When you create a workflow, you add tasks into it as per your requirement and execute the workflow to see the status in the monitor.
Workflow is a combination of multiple tasks connected with links that trigger in proper sequence to execute a process. Every workflow contains start task along with other tasks. When you execute the workflow, you actually trigger start task, which in turn triggers other tasks connected in the flow. Every task performs a specific functionality. You need to use the task based on the functionality you need to achieve.
The following are the tasks in Workflow Manager:
Workflow Manager screen has some very important features called as scheduling and incremental aggregation, which allows in easier and convenient processing of data. Scheduling allows you to schedule the workflow as specified timing so the workflow runs at the desired time. You need not manually run the workflow every time, schedule can do the needful. Incremental aggregation and partitioning are advanced features, which allows you to process the data faster.
When you run the workflow, Integration Service extracts the data in row wise manner from the source path/connection you defined in session task and makes it flow from the mapping. The data reaches the target through the transformations you defined in the mapping. The data always flow in a row wise manner in Informatica, no matter what so ever is your calculation or manipulation. So if you have 10 records in source, there will be 10 Source to target flows while the process is executed.
The Workflow Monitor screen allows the monitoring of the workflows execute in Workflow Manager. Workflow Monitor screen allows you check the status and log files for the Workflow. Using the logs generated, you can easily find the error and rectify the error. Workflow Manager also shows the statistics for number of records extracted from source and number of records loaded into target. Also it gives statistics of error records and bad records.
Repository Manager screen is the fourth client screen, which is basically used for migration (deployment) purpose. This screen is also used for some administration related activities like configuring server with client and creating users.
The performance tuning has the contents for the optimizations of various components of Informatica PowerCenter tool, such as source, targets, mappings, sessions, systems. Performance tuning at high level involves two stages, finding the issues called as bottleneck and resolving them. Informatica PowerCenter has features like pushdown optimization and partitioning for better performance. With the defined steps and using the best practices for coding the performance can be enhanced drastically.
Using all the understanding of the different client tools you can implement the Data warehousing concepts called as SCD, slowly changing dimensions. Informatica PowerCenter provides wizards, which allow you to easily create different types of SCDs, that is, SCD1, SCD2, and SCD3.
With this we have discussed the complete PowerCenter tool in brief. The PowerCenter is the best fit tool for any size and any type of data, which you wish to handle. It also provides compatibility with all the files and databases for processing purpose. The transformations present allow you to manipulate any type of data in any form you wish. The advanced features make your work simple by providing convenient options. The PowerCenter tool can make your life easy and can offer you some great career path if you learn it properly as Informatica PowerCenter tool have huge demand in job market and it is one of the highly paid technologies in IT market. Just grab a book and start walking the path. The end will be a great career. We are always available for help. For any help in installation or any issues related to PowerCenter you can reach me at [email protected].
Further resources on this subject: