(1) Running out of scratch space (2) Link Sort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rsamuel1181
Premium Member
Premium Member
Posts: 17
Joined: Tue Dec 27, 2011 2:53 pm

(1) Running out of scratch space (2) Link Sort

Post by rsamuel1181 »

Hello All

I have two questions, both around Datastage partitioning tweaks that I encountered

I have a few jobs that looks like this in general:

------------------------
------------------------
------------------------
------------------------
------------------------
------------------------ ---->Sort1---->Sort2---->Transformer1---------->Sort3----->Sort4---->Transformer2---->Dataset
------------------------
------------------------
------------------------
------------------------
------------------------


The big box on the left has a few stages - reading input files/tables, joins, CDC to capture changes, a few transformers to manipulate data based on change_code, and also a few sort+sort+transformer to group and process records using key_change column. I have link sorts on Joins, CDCs.

The stages following the box:
(Sort+Sort+Transformer)

Sort1 --> Sort based on Key_1, Key_2
Create Key Change Column = True
Also has a link sort, hash partitioned on the same Keys as the sort stage. (Question#2 below is around this link sort).

Sort2 --> Do Not Sort Previously Sorted on Key_1, Key_2
Sort Key_Change

Transformer1--> is used to manipulate records in groups of Key_1,Key_2 using key_change column and if--else conditions in Stage Variables and output derivations.


THe last set of Sort+Sort+Transformer is used primarily for String aggregation, but only grouped by Key_1

Sort3 --> Sort based on Key_1
Create Key Change Column = True
Also has a link sort, hash partitioned on the same Key as the sort stage (Again, please note Question#2)

Sort4 --> Do Not Sort Previously Sorted on Key_1
Sort Key_Change

Transformer2 --> is used to "string-aggregate" certain value columns grouped by Key_1 (FOr example, stage_var = input.col1 : input.col2 : etc. based on key_change).
Because we can't meaningfuly predict the size of aggregated columns, these column sizes left unbound in stage variables and in the output columns to datset. LongVarChar with no size.


Questions:
==========

Qn 1.
----
Sort3 is a killer. In the worst case I may have 20 million records, each about approximately 10,000 - 300,000 bytes wide perhaps (rough estimate on width) which I am aggregating into unbound longvarchar columns as explained in Transformer2. As the job runs the scratch space shoots from about 2% to 100% of utilization and aborts with errors (Fatal Error: Tsort merger aborting: mergeOneRecord() punted). This happens when the records are at Sort3 and half-way through that stage in terms of number of records shown on Designer.

We reduced the number of logical partitions and fixed it in jobs that had fewer number of records which failed due to the same reason (but 20mil still fails). What we did is:

APT GRID PARTITIONS = 1 (this was a higher number earlier, resulting in the same scratch disk space being shared by each logical node in config file. Value 1 reduced it to 1 scratch space per logical node and fixed the issue for jobs that processed fewer number of records - that is fewer than 20 mil). Each scratch disk has 100 GB.

APT GRID COMPUTENODES = 3 (3 is our max limit)
A sample grid config file for this setup (values 1 & 3) is given at the bottom

We have tried Oracle String Aggregation functions (like listagg, wm_concat) to avoid sort+sort+transformer technique for grouping/processing, but they all have limitations in our context.

All the Datastage work I have done in the past fortunately worked on the default config file setup. Could there be a different combination of PARTITION/COMPUTENODE that I should maybe try out? Since we made a positive difference by reducing the sharing of scratch space, I am curious that maybe there is more I could try out on those two partition variables/config file..


Qn 2.
----
Sort1 and Sort3 needs an additional link sort on the same keys, else, I have noticed my string aggregation or the group by processing logic in Transformers give erroneous output data. As I analyzed I figured same keys are falling across partitions, breaking the grouping logic and hence the difference in output. Extra sorting/partitioning on the input Link of Sort1 and Sort3 solved the issue, but why do I need link sort on a sort stage to avoid keys spilling across partitions? I ask this specifically because, if I can get rid of this seemingly unwanted link sort I might save some scratch space utilization.


Runtime config file:
-------------------

{
node "Conductor"
{
fastname "sfodvdshn01-prv"
pools "conductor"
resource disk "NOT USED" {pools ""}
resource scratchdisk "NOT USED" {pools ""}
}
node "ComputeN1"
{
fastname "sfodvdscn02-prv"
pools ""
resource disk "/iispool/dataset/data" {pools "" "export"}
resource scratchdisk "/scratch/ds/bank_1" {pools ""}
}
node "ComputeN2"
{
fastname "sfodvdscn02-prv"
pools ""
resource disk "/iispool/dataset/data" {pools "" "export"}
resource scratchdisk "/scratch/ds/bank_1" {pools ""}
}
node "ComputeN3"
{
fastname "sfodvdscn02-prv"
pools ""
resource disk "/iispool/dataset/data" {pools "" "export"}
resource scratchdisk "/scratch/ds/bank_1" {pools ""}
}
}
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I'm not a developper, but I play one on TV.


How about you sort only the data you actually need to sort?

Split your data stream into two. Sort / transform what you need, then join it back at the end before the dataset.

There must be some columns that you are not manipulating and could bypass the sort / transform stages.
rsamuel1181
Premium Member
Premium Member
Posts: 17
Joined: Tue Dec 27, 2011 2:53 pm

Post by rsamuel1181 »

Thought about that too. I am taking only the columns I need in that sort+sort+transformer logic. In fact I thought abou this earlier and wondered if the sorts needed on new Joiner would be another bottleneck. So, yes, I am passing only the columns I actually need in that data stream. That's a good suggestion though. Thanks!
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Do you have a secondary scratch disk mount you can use for spill over?

like

resource scratchdisk "/scratch/ds/bank_1" {pools ""}
resource scratchdisk "/scratch/ds/bank_2" {pools ""}

where that is actually a different mount not just a subdirectory.

or maybe...

fastname "sfodvdscn03-prv"

where it would be a seperate server with it's own scratch to use?
rsamuel1181
Premium Member
Premium Member
Posts: 17
Joined: Tue Dec 27, 2011 2:53 pm

Post by rsamuel1181 »

We are shooting for additional storage, yes. But that's only the next best solution and won't happen immediately. If there is a programmatic solution that would be the best thing to happen, as you know. Right now we are doing multiple runs by splitting the input data, which is a lot of trouble of course. Just trying out all options until storage gets added. Besides we are not going get a whole lot of storage space. Thanks.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Well, if you have so little scratch that one job is currently blowing it up, you really need more since you don't even have room to grow. You are also limmiting your environment to only run this one job and none other at the same time.


Pitch it this way to your management:

We paid how much for software licensing on this environment?
Does the cost of this additional storage prevent us from allowing customer SLA's to be met because now we have to break the data into chunks and run multiple times sequentially?
What is our plan for growth of our data?
rsamuel1181
Premium Member
Premium Member
Posts: 17
Joined: Tue Dec 27, 2011 2:53 pm

Post by rsamuel1181 »

I was able to fix Sort3 in Question# 1. The culprit was unbound variables of course. The size of data per field would increase only in Transformer2 due to string aggregation, requiring unbound variables only there. I explicitly declared the size of some of these in Sort3 and that fixed it (my bad :x )

Question#2 still remains though. Why would I need extra link sorts with hash on Sort 1 and Sort3 to get correctly sorted and partitioned result? Shouldn't the sort stage itself take care of it?

Stable Sort property is enabled in these Sort stages (not on the link sorts). So I am tempted to think the link sort is the actual sorter and sort stage is running on Stable Sort. Is that a right assumption? As far as performance goes, doing the Stable Sort is an extra bump on the way (well its the default property) or easing up things?
Post Reply