Monday, 14 July 2014

Informatica - ETL testing -Challenges


ETL testing is a bit complicated since the complexity of testing cannot be hidden behind button or GUIs. The tester will have write SQLs on variety of databases, some times write own mappings to test the ETL code since ETL involves integration of data from heterogeneous data sources and transformation that cannot be easily coded using SQLs.

Things that might have to be tested during ETL testing:

1) The data from multiple sources is properly integrated, transformed, aggregated, sorted etc and loaded into the target database.
2) The data is properly landed, staged and loaded into mart tables.
3) Proper batch keys are inserted for delta loads and delta loads are working properly.
4) The data is inserted/updated in the target table.
5) Historical loads and daily loads are working as expected.
6) Test of events, commands tasks, schedule and notification.
7) Validation of error cases and rejected data.
8) Performance testing
9) Impact testing: Testing interfaces to upstream and downstream processes.
10) Regression testing: test if the existing data/tables/etc are not broken.

Complication in ETL testing:

a) Availability of all test data and creation of test data for all the test cases.
b) Understanding the complexities of the ETL tool and data warehousing.
c) Working with data spread across multiple databases.

Just out of interest, If you want to check out different roles in data warehousing, please read:
http://dwbitechguru.blogspot.ca/2015/09/career-and-hot-jobs-in-data-warehousing.html
How ETL testing can be done:

a) If you are testing informatica code and if testing involves only a few impacted sessions, a separate workflow can be created just for the purpose of testing. This avoids searching for the right session to run to test the code.

b) Perform a run of the ETL jobs for historical data and for daily loads separately and test both loads.

c) Test the load of data into landing area. If this is working properly, then it is easy to write SQLs to validate data against landing area since data from multiple sources are brought into one landing area.

d) If jobs involve files, make sure the files are deleted/archived and folders ready for next load.

e) Look at the ETL design document to understand how ETL is designed.

f) Make sure the ETL code can handle the data volume and meet all the performance parameters.

g) There might be some ETL testing tools that could make testing easier. Informatica has a set of etl testing tools of their own. 

See also:
http://dwbitechguru.blogspot.ca/2014/07/business-object-reports-testing.html