Handling Nulls and Data Types

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vuluvala
Participant
Posts: 7
Joined: Tue Jul 22, 2008 10:47 am

Handling Nulls and Data Types

Post by vuluvala »

Hi,
I have questions on my design approach. I am using Modify stage in my ETL process(es) to handle null values for various data types. I ended up having a modify stage whenever a database/file lookup is performed. That means I have a modify stage between lookup stage and database/file stage. The default values (set in modify stage for null values) flow through various stages (including Transform) and finally the default values (for Nulls) are changed back to Nulls in the Transform stage before updating database (Oracle) table(s). Is it a right approach? My concern is that there is lot of data manipulation by me to handle null values instead of letting ETL tool to manage them the way they are supposed (based upon the ETL tool design itself). Even for the change capture stage (captures edited/insert/delete/modify records), I ended up having modify stage to handle NULL values for the two input links (to the change capture stage). If this design approach is o.k, I will be using the same for other ETL jobs. Also I noticed that, if lookup fails (either matching record is not found or table/file is empty) the Timestamp data field is getting assigned to "*********", which will cause a problem when I try to convert them back to NULL prior to sending (I used filter stage) them to reject file/table. Please let me know how to handle the Timestamp data type issue (having ********* instead of default value). I want to avoid the risk of re-designing all ETL jobs, if my current design approach has any major flaws. I greatly appreciate, if you could respond with your suggestions.

Thanks
Suresh.
vuluvala
shamail
Participant
Posts: 10
Joined: Mon Jan 12, 2009 10:36 pm

Post by shamail »

Why don't you allow the fields to remain nullable. Is there any specific requirement that forces you to use default values for nullable fields. if not then you can probaly handle the null fields towards the end of your job. Also timestamp field '*******' means you are inserting a invalid value into the timestamp field.
vuluvala
Participant
Posts: 7
Joined: Tue Jul 22, 2008 10:47 am

Post by vuluvala »

Hi Shamail,
Thanks for your response. The problem with null values is that the Datastage does not handle the null values and the job either fails with fatal run time error or drop (as it says in the log) the records. So it is recommended to handle nulls programmatically. As for the Timestamp is concerned, when the lookup fails (due to no rows in the table, for example) then there is no data (which is obviously invalid data) for Datastage to apply the conversion in Modify stage, which is resulting in "*********". I would like to know how to handle this particular situation. Hope I was able provide enough information.

Thanks
Suresh.
shamail
Participant
Posts: 10
Joined: Mon Jan 12, 2009 10:36 pm

Post by shamail »

A drop of records happens for null values mainly when u try to perform some operation on a field and when null comes into this field. The best way would be to let remain the field nullable itself towards the end of the job and then finally just before loading into the database or so, check if the field is nullable and if yes provide a default value else do the required operation. Modify stages at each point of the job is really not suggested.
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi,

Do a search. The problem is because of the difference between the incoming format and the expected format of the timestamp field.

HTH
--Rich
mohan bommalingaiah
Participant
Posts: 3
Joined: Tue Oct 02, 2007 5:04 am

Re: Handling Nulls and Data Types

Post by mohan bommalingaiah »

w.r.t. timestamp field that allows nulls.
Its a known problem

Say you have a datetimestamp field which allows nulls.
If you are selecting a datetimestamp from the database, change that field datatype from timestamp 26, 6 to char (26) with a precision of 6.
Either you can change the timestamp fields with validation which have a null value to a char value of NULL.
Finally, when you want to insert this data to the database, just before the insert you can have a transform to convert the datetimestamp field from a char(26) to a timestamp 26 precision 6 and check if the datetimestamp field has a value of NULL if so setNull(), and then insert to the database.

This way you hold the precision of your timestamp all the way from extract, and have a char value of NULL for the null values, and change back the char value of NULL to setNull () using function before insert into database and the field type changed back to a timestamp.


vuluvala wrote:Hi,
I have questions on my design approach. I am using Modify stage in my ETL process(es) to handle null values for various data types. I ended up having a modify stage whenever a database/file lookup is performed. That means I have a modify stage between lookup stage and database/file stage. The default values (set in modify stage for null values) flow through various stages (including Transform) and finally the default values (for Nulls) are changed back to Nulls in the Transform stage before updating database (Oracle) table(s). Is it a right approach? My concern is that there is lot of data manipulation by me to handle null values instead of letting ETL tool to manage them the way they are supposed (based upon the ETL tool design itself). Even for the change capture stage (captures edited/insert/delete/modify records), I ended up having modify stage to handle NULL values for the two input links (to the change capture stage). If this design approach is o.k, I will be using the same for other ETL jobs. Also I noticed that, if lookup fails (either matching record is not found or table/file is empty) the Timestamp data field is getting assigned to "*********", which will cause a problem when I try to convert them back to NULL prior to sending (I used filter stage) them to reject file/table. Please let me know how to handle the Timestamp data type issue (having ********* instead of default value). I want to avoid the risk of re-designing all ETL jobs, if my current design approach has any major flaws. I greatly appreciate, if you could respond with your suggestions.

Thanks
Suresh.
M-B
vuluvala
Participant
Posts: 7
Joined: Tue Jul 22, 2008 10:47 am

Post by vuluvala »

I thank you all for great suggestions.

thanks
Suresh
its_me48
Participant
Posts: 33
Joined: Fri Apr 29, 2005 10:09 am

same problem here..

Post by its_me48 »

This is my first enterprise job and I am having the exact same problem the original poster had (if lookup fails (either matching record is not found or table/file is empty) the Timestamp data field is getting assigned to "*********")....
I am getting ********** and the record is not inserted in the oracle table.
I built the same job on server edition and have no problems at all...

I thought my lookup stage had problems so built the job using join and have the same problem...

I did try to make sense of some of the replies posted but find it confusing... can someone please elaborate on what exactly I should be doing?

Thanks much,
-Sam
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The timestamp field must be nullable on the outer input link(s) and on the output link.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply