Scratch Disk & Resource Disk Space Issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

Scratch Disk & Resource Disk Space Issue

Post by Nagac »

Hi

I have Data stage Job which process the file has 30GB(14m Rows). Which will Remove the Duplicates and writes the valid data into Dataset.

Which is creating 230 GB datafiles in Resource Disk and 200GB temp files in Scratch Disk. Does Dataset Data files become these many time on actual raw files? As I don't have any transformation or extra fields in it.
Could someone suggest on this?

Thanks
Last edited by Nagac on Tue Oct 28, 2014 12:06 pm, edited 1 time in total.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

You're asking us to analyze your job for which we have no visibility into.

Kinda hard to do that since our crystal balls are not working at the moment.
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

Post by Nagac »

I am Sorry.... if it means in that way.

But I would like to know How much space Resource Disk We need to process 1 GB Flat File(CSV), Because I am processing one file which is 30GB which is creating 230GB Files in Resource Disk Area where I have no transformations Just removing the Duplications if there are any.

Thanks
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Do you have lots of Varchar fields with large lengths? Remove the length value to change them to unbounded varchars and you should see your space consumption go way down.

Mike
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

Post by Nagac »

Thanks Mike,

Yeah, I have many fields which has nvarchar(255) and few nvarchar(max).

I will check it.

Does APT_MAX_TRANSPORT_BLOCK_SIZE, APT_MAX_DELIMITED_SIZE change the any space issue?
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Those environment variables will do nothing for your space issue.

Change your nvarchar(255) to simply nvarchar with no maximum length specified.

An nvarchar(255) physically requires 510 bytes of storage in a dataset. An nvarchar with no length specified physically requires 2 bytes per character of data plus a couple of bytes to store the length.

Dataset storage changed at version 7.0.1 to favor processing speed over storage space.

Mike
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

Post by Nagac »

Thanks Mike for your information
Post Reply