Handling Nulls and Data Types

vuluvala · Post by **vuluvala** » Sat Feb 21, 2009 6:38 pm

Hi,
I have questions on my design approach. I am using Modify stage in my ETL process(es) to handle null values for various data types. I ended up having a modify stage whenever a database/file lookup is performed. That means I have a modify stage between lookup stage and database/file stage. The default values (set in modify stage for null values) flow through various stages (including Transform) and finally the default values (for Nulls) are changed back to Nulls in the Transform stage before updating database (Oracle) table(s). Is it a right approach? My concern is that there is lot of data manipulation by me to handle null values instead of letting ETL tool to manage them the way they are supposed (based upon the ETL tool design itself). Even for the change capture stage (captures edited/insert/delete/modify records), I ended up having modify stage to handle NULL values for the two input links (to the change capture stage). If this design approach is o.k, I will be using the same for other ETL jobs. Also I noticed that, if lookup fails (either matching record is not found or table/file is empty) the Timestamp data field is getting assigned to "*********", which will cause a problem when I try to convert them back to NULL prior to sending (I used filter stage) them to reject file/table. Please let me know how to handle the Timestamp data type issue (having ********* instead of default value). I want to avoid the risk of re-designing all ETL jobs, if my current design approach has any major flaws. I greatly appreciate, if you could respond with your suggestions.

Thanks
Suresh.

shamail · Post by **shamail** » Mon Feb 23, 2009 6:41 am

Why don't you allow the fields to remain nullable. Is there any specific requirement that forces you to use default values for nullable fields. if not then you can probaly handle the null fields towards the end of your job. Also timestamp field '*******' means you are inserting a invalid value into the timestamp field.

vuluvala · Post by **vuluvala** » Mon Feb 23, 2009 6:29 pm

Hi Shamail,
Thanks for your response. The problem with null values is that the Datastage does not handle the null values and the job either fails with fatal run time error or drop (as it says in the log) the records. So it is recommended to handle nulls programmatically. As for the Timestamp is concerned, when the lookup fails (due to no rows in the table, for example) then there is no data (which is obviously invalid data) for Datastage to apply the conversion in Modify stage, which is resulting in "*********". I would like to know how to handle this particular situation. Hope I was able provide enough information.

Thanks
Suresh.

shamail · Post by **shamail** » Mon Feb 23, 2009 11:51 pm

A drop of records happens for null values mainly when u try to perform some operation on a field and when null comes into this field. The best way would be to let remain the field nullable itself towards the end of the job and then finally just before loading into the database or so, check if the field is nullable and if yes provide a default value else do the required operation. Modify stages at each point of the job is really not suggested.

richdhan · Post by **richdhan** » Tue Feb 24, 2009 1:23 am

Hi,

Do a search. The problem is because of the difference between the incoming format and the expected format of the timestamp field.

HTH
--Rich

mohan bommalingaiah · Post by **mohan bommalingaiah** » Tue Feb 24, 2009 9:32 am

w.r.t. timestamp field that allows nulls.
Its a known problem

Say you have a datetimestamp field which allows nulls.
If you are selecting a datetimestamp from the database, change that field datatype from timestamp 26, 6 to char (26) with a precision of 6.
Either you can change the timestamp fields with validation which have a null value to a char value of NULL.
Finally, when you want to insert this data to the database, just before the insert you can have a transform to convert the datetimestamp field from a char(26) to a timestamp 26 precision 6 and check if the datetimestamp field has a value of NULL if so setNull(), and then insert to the database.

This way you hold the precision of your timestamp all the way from extract, and have a char value of NULL for the null values, and change back the char value of NULL to setNull () using function before insert into database and the field type changed back to a timestamp.

vuluvala wrote:Hi,
I have questions on my design approach. I am using Modify stage in my ETL process(es) to handle null values for various data types. I ended up having a modify stage whenever a database/file lookup is performed. That means I have a modify stage between lookup stage and database/file stage. The default values (set in modify stage for null values) flow through various stages (including Transform) and finally the default values (for Nulls) are changed back to Nulls in the Transform stage before updating database (Oracle) table(s). Is it a right approach? My concern is that there is lot of data manipulation by me to handle null values instead of letting ETL tool to manage them the way they are supposed (based upon the ETL tool design itself). Even for the change capture stage (captures edited/insert/delete/modify records), I ended up having modify stage to handle NULL values for the two input links (to the change capture stage). If this design approach is o.k, I will be using the same for other ETL jobs. Also I noticed that, if lookup fails (either matching record is not found or table/file is empty) the Timestamp data field is getting assigned to "*********", which will cause a problem when I try to convert them back to NULL prior to sending (I used filter stage) them to reject file/table. Please let me know how to handle the Timestamp data type issue (having ********* instead of default value). I want to avoid the risk of re-designing all ETL jobs, if my current design approach has any major flaws. I greatly appreciate, if you could respond with your suggestions.

Thanks
Suresh.

vuluvala · Post by **vuluvala** » Tue Feb 24, 2009 12:03 pm

I thank you all for great suggestions.

thanks
Suresh

its_me48 · Post by **its_me48** » Tue Feb 24, 2009 3:26 pm

This is my first enterprise job and I am having the exact same problem the original poster had (if lookup fails (either matching record is not found or table/file is empty) the Timestamp data field is getting assigned to "*********")....
I am getting ********** and the record is not inserted in the oracle table.
I built the same job on server edition and have no problems at all...

I thought my lookup stage had problems so built the job using join and have the same problem...

I did try to make sense of some of the replies posted but find it confusing... can someone please elaborate on what exactly I should be doing?

Thanks much,
-Sam

ray.wurlod · Post by **ray.wurlod** » Tue Feb 24, 2009 5:38 pm

The timestamp field must be nullable on the outer input link(s) and on the output link.

DSXchange

Handling Nulls and Data Types

Handling Nulls and Data Types

Re: Handling Nulls and Data Types

same problem here..