Friday, 3 February 2017

Sample parameter file and using parameter file in Informatica

Sample parameter file and using parameter file in Informatica

Below is a sample of a simple parameter file. All the global variables used by all the sessions are defined under [Global] and the session specific variable overrides are defined under session names.






The parameter file can be set at session level or at workflow level. The worklevel parameter applies to all the session unless it is over riden by another parameter file at session level.


Thursday, 3 November 2016

Find all columns with only nulls in SQL Server table

Find all column fields with only nulls in SQL Server table

The piece of code below will give list of all columns that have only nulls in a give table. Replace YourTableName with your tablename in the code below. This is variation of code that I found here ( 

CREATE TABLE #NullColumns (ColumnName Varchar(100))
DECLARE @TableName Varchar(100)
SET @TableName='YourTableName'
OPEN crs
DECLARE @name sysname
    EXEC('INSERT INTO #NullColumns (ColumnName) SELECT ''' + @name + ''' WHERE NOT EXISTS (SELECT * FROM ' + @TableName + ' WHERE ' + @name + ' IS NOT NULL)')
    FETCH NEXT FROM crs INTO @name
SELECT * FROM #NullColumns
DROP TABLE #NullColumns


Wednesday, 26 October 2016

Static vs Dynamic Lookup in Informatica

Difference between Static vs Dynamic Lookup In Informatica

Static Lookups: 

These are are used when you want to lookup a value on a static table or a SQL query based on certain matching conditions. It caches the values when the static lookup in created at the begining of the execution of the session. If the values in the lookup tables are changing and you want the changed values to be reflected during execution of the session then you need to use dynamic cache. 
When you create a lookup transformation it by defaults creates a static cache. Static lookups can be connected or unconnected depending on weather you want multiple or single value to be returned from the lookup.  Check out unconnected transformation here.

Dynamic Lookups: 

These kind of lookups are required when you want the changes happening in the lookup table to be reflected during the execution of the mapping/session. You can create dynamic lookups by setting Dynamic lookup Cache property in the properties section of the lookup transformation. When you make the lookup dynamic, you will observe in ports tab that a new field called NewLookupRow is created automatically. This NewlookupRow port will indicate if a row has been inserted or updated in the cache. In the dynamic lookup the records gets inserted or updated in the cache depending on weather the rows existed or not existed in the cached. 

Use of NewLookupRow in Dynamic Cache:

NewLookupRow port can have the following values:-
0 - Indicates that powercenter did not update or insert the row in the cache.
1 - Indicates that powecenter inserted the row into the cache because it is a new row
2-  Indicates that powecenter updated the row in the cache because the lookup value has changed.

Using the above values you can decide using update or filter transformation what you want to do with those rows coming of the source. Like for example if you want to insert it into the target table or update it? If NewLookupRow=1 then insert it to the target table or if it equal to 2 then update the record in the target table.


Monday, 24 October 2016

How to Loop in SQL server?

How to Loop in SQL server? 

The below example shows how to loop in microsoft SQL server and insert records in chunks: This makes inserts faster.

SET IDENTITY_INSERT dbo.table1 ON /* This is required if you are adding data to Identity columns */

DECLARE @cnt INT = 2006;

WHILE @cnt < 2016
select * from table1 where [Reporting_Period_SID]= @cnt
  SET @cnt = @cnt + 1


Selecting top 3 (n) or Max 3 (n) records for each group by condition

Use the example to select top 3 records for each group. The row number is broken by the Sales_Reporting_Period and then is used to filter the top 3 records.

SELECT Company_code, Run_sid, Curr_ind, Sales_Reporting_Period from
(SELECT          AF.Company_Code,
                BB.Run_SID ,
              ,BB.Sales_Reporting_Period , ROW_NUMBER()
    over (
        PARTITION BY Sales_Reporting_Period
        order by Sales_Reporting_Period, BB.Run_Sid Desc
    ) AS RowNo
                       [dbo].[Run_Dim] BB,
                       [dbo].[My_Reserve_Fact] AF
              AF.Run_SID = BB.Run_SID
              group by
                     ,BB.Run_SID)Q1 where RowNo <=3


Thursday, 6 October 2016

SQL server and azure troubleshooting - common issues

 SQL server and azure troubleshooting - common issues

1) Msg 245, Level 16, State 1, Line 2

Conversion failed when converting the varchar value '000005EW84' to data type int.

Resolution: this issue happens when there is character in the column that you are converting to integer. Use a case statement like below to remove those records that have a column. In the example below policy_number is the column that has characters.

case when isnumeric(policy_number) = 1 then cast(policy_number as int) else 0 end as policy_num

2) Cannot insert explicit value for identity column in table 'Mytable_Dim' when IDENTITY_INSERT is set to OFF.

Resolution: You're inserting values for OperationId that is an identity column.

You can turn on identity insert on the table like this so that you can specify your own identity values.


/*Note the column list is REQUIRED here, not optional*/
VALUES      (20,
             ''My data")


3)Msg 515, Level 16, State 2, Line 45

Cannot insert the value NULL into column 'Party1_SID', table 'dbo.Party_Dim'; column does not allow nulls. INSERT fails.

Resolution: The column Party1_Sid does not accept null values.  Insert a value of -1 to the column whenever the value is null. Isnull() function can be used to test if a column value is null.

4) Msg 120, Level 15, State 1, Line 45 The select list for the INSERT statement contains fewer items than the insert list.

Resolution:The number of SELECT values must match the number of INSERT columns.
Match the number of columns on select and insert statements..

5) Msg 264, Level 16, State 1, Line 45

The column name 'Party1_SID' is specified more than once in the SET clause or column list of an INSERT.

Resolution: Remove any duplication of columns in the query. A column cannot be assigned more than one value in the same clause. Modify the clause to make sure that a column is updated only once. If this statement updates or inserts columns into a view, column aliasing can conceal the duplication in your code.

6) Arithmetic overflow error converting numeric to data type numeric.

Resolution: Increase the numeric precision. If there are datatype such as  numeric(11,2) you might have to make it high so that all the values can fit into the datatype. For example, make it numeric(20,2).

7) Msg 1033, Level 15, State 1, Procedure test_vw, Line 56
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

 Resolution: Remove order by from the view statement.



Thursday, 25 August 2016

Improving performance of Inserts, updates and deletes in SQL server

How to make Inserts , updates and deletes faster in SQL server  - Microsoft Azure?

I tried to delete some records from a table with 100 millions records and it took forever. One of the reason being there was no index on the column I used in the where clause of the delete statement and other reason being the table was so big. So deleting a big chunk of the data from such big tables can take forever since the data gets written to transaction logs and the table has to scan for all the records that match the where clause condition.

How to delete records quickly from huge tables in sql server on Azure? 

1) Use only the column that has index defined in the where clause condition. May be this will help find the records to be deleted quickly.
2) Delete records in small chunks using the loop below. This prevents transaction logs from growing rapidly.  @@rowcount always returns the number of records affected from delete.

DELETE TOP (100000) from dbo.mytable WHERE name='dwbi'
WHILE @@rowcount > 0
  DELETE TOP (100000) from dbo.mytable
    WHERE name='dwbi'
3)  Insert data into temp table with no index on it using the columns that need not be deleted , delete the original table, rename the temp table back to the original table and add back the index on the newly renamed temp table.

select * into tmp_mytable from mytable where name!='dwbi'
drop mytable
sp_rename 'tmp_mytable', 'mytable'

How to insert records quickly from huge tables in sql server on Azure? 

My records to insert records into a huge table improved a lot after I disabled the non clustered indexes and rebuild it after the insert. The steps are below:

Insert into [dbo].[mytable] select * from [dbo].[stg_table]


How to update records quickly from huge tables in sql server on Azure? 

My  usual way of updating records used to be something like this:

update dbo.mytablename set name= (
from TMP_table_name
where dbo.mytablename.party_sid=TMP_table_name.party_sid)

The above update statement takes a lot of time because it updates all the records even those that do not match. The easier and faster way to update is using the statement below:  This will update only the matching records and it is pretty quicker.

update dbo.mytablename set
from TMP_table_name
where dbo.mytablename.party_sid=TMP_table_name.party_sid


Friday, 8 July 2016

Performance tuning of Informatica Big Data Edition Mapping

Performance tuning of Informatica Big Data Edition Mapping

Below are the list of performance tuning steps that can be done in Informatica Big Data Edition:

1)  When using a look up transformation only when the lookup table is small. Lookup data is copied to each node and hence it is slow.

2) Use Joiners instead of lookup for large data sets.

3) Join large data sets before small datasets. Reduce the number of times the large datasets are joined in Informatica BDE.

4) Since Hadoop does not allow updates, you will have to rebuild the target table whenever the record is updated in a target table. Instead of rebuilding  the whole table, consider rebuilding only the impacted partitions.

5) Hive slower with any non string data type. It needs to create temp tables to do the conversion to and from the non string data type to string data type. Use non string data type only when required.

6) Use the data type precision close the actual data. Using higher precision slows down the performance of Informatica BDE.

7) Map only the ports that are required in the mapping transformation or loaded to target. Less number of ports means better performance and less data reads.


Workarounds for Mapping variables and parameters and sequence generators and sorters in BDE

Mapping variables and parameters and sequence generators and sorters in BDE

Since there are no mapping variables/parameters and sequence generators in BDE, you can use the following workarounds:

For mapping variables and parameters, you can use a control table or a files instead and read the control tables or files in the mapping and use the content of the table/files in your mapping. Create a look up on the control table to get values for all the parameters defined in the control table. You can update the control table if the parameter needs to be updated at the end of the run.

For Sequence generator, you can use UUID (Unique Identified) functions instead. These UUID functions are alphanumeric and if you need numeric only then use Java functions.

HSQL does sorting by default. i.e Hadoop does the sorting and so you do not need sorter unless you are using it with a aggregator that has a sorted input. In this case you need to add a sorter to validate the mapping.

Thursday, 7 July 2016

Useful Queries for troubleshooting amazon redshift


Here are some of my queries for troubleshooting in amazon redshift. I have collected this from different sources.


select a.userid, cast(u.usename as varchar(100)), a.query, a.label,, a.starttime, b.duration,
b.duration/1000000 as duration_sec, b.query as querytext
from stv_inflight a, stv_recents b, pg_user u
where = and a.userid = u.usesysid

select pid, trim(user_name), starttime, substring(query,1,20) from stv_recents where status='Running'


cancel <pid>

You can get pid from one of the queries above used to check running queries.


select * from STL_ALERT_EVENT_LOG
where query = 1011
order by event_time desc
limit 100;


select trim(pgdb.datname) as Database, trim(pgn.nspname) as Schema,
trim( as Table, b.mbytes, a.rows
from ( select db_id, id, name, sum(rows) as rows from stv_tbl_perm a group by db_id, id, name ) as a
join pg_class as pgc on pgc.oid =
join pg_namespace as pgn on pgn.oid = pgc.relnamespace
join pg_database as pgdb on pgdb.oid = a.db_id
join (select tbl, count(*) as mbytes
from stv_blocklist group by tbl) b on
order by b.mbytes desc, a.db_id,;


analyze <tablename>;
analyze compression <tablename>;


select "column", type, encoding
from pg_table_def where tablename = 'biglist';


select * from stl_load_errors

select * from stl_load_commits

select query, trim(filename) as file, curtime as updated, *
from stl_load_commits
where query = pg_last_copy_id();


select d.query, substring(d.filename,14,20),
d.line_number as line,
substring(d.value,1,16) as value,
substring(le.err_reason,1,48) as err_reason
from stl_loaderror_detail d, stl_load_errors le
where d.query = le.query
and d.query = pg_last_copy_id();


select owner as node, diskno, used, capacity
from stv_partitions
order by 1, 2, 3, 4;
select query, trim(querytxt) as sqlquery
from stl_query
order by query desc limit 5;


To resize the redshift cluster (node type and number of nodes always required):

aws  redshift modify-cluster --cluster-identifier <cluster name> --node-type dw2.8xlarge --number-of-nodes 3

To get filelist on S3:

aws s3 ls $BUCKET/  > ./filecount.out

To get status of cluster and other information of cluster in text format:

aws redshift describe-clusters --output text   


Thursday, 28 April 2016

How to run stored procedure in Informatica cloud

How to run stored procedure in Informatica cloud?

There are multiple ways to run stored procedure in Informatica cloud.

If you want to call a stored procedure by passing an input value and reading the output from the stored procedure and process it in a mapping then it is better to use SQL transformation in Informatica cloud mapping designer as shown in picture below:

If you want to read the output of a stored procedure as a source in a mapping then you can call the stored procedure a query in the source as shown in screenshot below. The exec sp_help stored procedure is called in the source query. The stored procedure should return some data for this method to work. You might have to define the output variables in the stored procedure.

If you want to just execute a stored procedure once before or after executing a mapping then you can call the stored procedure a pre or post processing command in the mapping configuration as shown in the screenshot below. SQLCMD of sql server is used to execute the stored procedure. The output of the stored procedure is written to an output file. -b flag is used to return error code and make the preprocessing command fail when the stored procedure fails. Use session property  Preprocessing command failure and set it to stop if you want failure to stop the processing of other jobs.