Teradata Connector Stage causing skew in Data Load

abhilashnair · Post by **abhilashnair** » Wed Jan 28, 2015 2:20 am

Not sure if the below is a DS issue or Teradata issue. Neverthless , I am posting it here

We are using Teradata connector in Bulk/Load mode (this is nothing but fastload in Teradata parlance)to load data into empty teradata table with three columns. The source is a sequential file stage.

Out of three columns in the table, 2 columns have nulls for all rows and one column has values for all rows. This column which has values, is the primary index of the table and the values are unique. The table itself is a multiset table i.e, it allows duplicates

When the job runs, it aborts after processing 7 million rows out of 60 million total rows in source. The error we are getting is

Teradata_Connector,0: RDBMS code 2644: No more room in database .

When we got in touch with the DB team they told there is a high amount of "skew factor" causing the space to be filled and we have to solve this first rather than increasing space.

Any suggestions?

chulett · Post by **chulett** » Wed Jan 28, 2015 8:08 am

Doesn't really seems like a DataStage problem to me now that I've looked up what the heck skew factor means. Seems to be more about the nature of your data and the design of the table rather than anything the connector is somehow "causing". Run the fastload outside of DataStage, I would wager you'll still see the problem.

Did your DBA team have any useful suggestions on how to handle / deal with the skew problem? Or did they just mutter something about DataStage, make a sign to ward off evil and then send you on your way?

abhilashnair · Post by **abhilashnair** » Wed Jan 28, 2015 11:57 am

Hmm..Pretty much..The DBA team suggested to have a primary index on the table and for that choose a column with unique values..But this is already taken care off. The table already has a primary index. In fact the table has only 3 columns in all..Out of that two columns are always nulls and the third one is non nullable and is the primary index

ray.wurlod · Post by **ray.wurlod** » Wed Jan 28, 2015 3:32 pm

If your primary key data values are skewed there is ABSOLUTELY NOTHING you can do about except point this fact out to the DBAs.

nagarjuna · Post by **nagarjuna** » Fri Jan 30, 2015 4:42 am

You mentioned that table is a multi set table . Can you check how many duplicates are present ? May be duplicates are causing skew distrubtion - more data on a particulat amp.