Sunday, 13 July 2014

Performance Tuning in Informatica

Performance Tuning in Informatica:


The following can be done to improve the performance of Informatica mappings and sessions:

Lookups:

  a) Reduce the number of lookup transformations:
  b) In the look up transformation override the SQL to select only the columns that are required.
  c) Also the lookup conditions should help retrieve less number of rows.
  d) Caching look up also improves the performance. Use static cache or persistant cache.
  e) Index the lookup tables if indexing is supported. Some data warehouse appliance like netezza do not need indexing.

Joiners:

 a) Joiners take lot of memory and it is good to reduce the number of joiner transformations in the mapping.
 b) Use join in the SQL in the source qualifier and use Joiner transformation only when reading tables from two different databases and if those tables have to be joined.

Source Qualifiers and pre/post SQL :

 a) It is a good idea to put as much logic as possible in the Source Qualifier SQL overide if you are using data warehouse appliance such as Netezza. The reason being these appliances are very powerful and support parallelism at the database level which improves the performance of the  reader.
 b) You can complex SQL logic in the pre/post SQL if you are again using appliances such as netezza. The more you can reduce the data traffic flowing to the informatica server, the better will be the performance of the informatica tasks.

Aggregator transformation:

 a) Use sorted input for aggregator  transformation.
 b) As much as possible avoid aggregator transformation and try to use aggregation at the SQL in the source qualifier.

Pushdown optimization:

If you have license for pushdown optimization then use pushdown optimization to improve performance.

If you do not have license, best way to achieve pushdown optimization is to move the logic to data base using preSQL or postSQl in the sessions.

For example: If you are inserting large amount of data into a table then some logic like below in the preSQL will improve the performance. Put this code in the preSQL of the session-source.

Insert into targettable (select col1, col2, sum(col3) from srctable1, srctable2 where srctable1.col1=srctable2.col2 group by col1, col2)

At the session level, the following settings can improve the performance:


1)  Increase the number of partitions if possible to partition and read the data.
2)  Increase the cache memory.
3)  Do not go for verbose logging. The number of logged records must be less. The number of bad records should also be few. If there are lot of bad records that can impact the performance too.
4) Reduce the number of source and target connections if possible. If reading multiple tables from same database use one connection and join the tables in the source qualifier.
5) Try to use bulk writers if possible. Bulk writers provide higher write speeds because it does not log every write operation.


Other parameters, the influence informatica performance are:


1) Network bandwidth and the traffic on our network.
2) Informatica server capacity and other jobs running on the server.
3) The driver you are using. This might be a factor if reading from cloud such as amazon redshift.
4) Running multiple instances of the same workflow.