ScratchDisk Configuration

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

ScratchDisk Configuration

Post by elavenil »

Hi,

we use the below 4 nodes configuration for our prod environment and curious to understand multinode configuration for Scratch disk (scratch area) will improve performance during SORT operation.

Here is the configuration file.

Code: Select all

{
                node "node1"
                {
                        fastname "prd_edw"
                        pools ""
                        resource disk "/EDW/SG1/" {pools ""}
                        resource scratchdisk "/EDW/SCR1/" {pools "sort" ""}
                        resource scratchdisk "/EDW/SCR2/" {pools ""}
                }
                node "node2"
                {
                        fastname "prd_edw"
                        pools ""
                        resource disk "/EDW/SG2/" {pools ""}
                        resource scratchdisk "/EDW/SCR2/" {pools "sort" ""}
                        resource scratchdisk "/EDW/SCR1/" {pools ""}

                }
               node "node3"
                {
                        fastname "prd_edw"
                        pools ""
                        resource disk "/EDW/SG3/" {pools ""}
                        resource scratchdisk "/EDW/SCR1/" {pools ""}
                        resource scratchdisk "/EDW/SCR2/" {pools "sort" ""}

                }
               node "node4"
                {
                        fastname "prd_edw"
                        pools ""
                        resource disk "/EDW/SG4/" {pools ""}
                        resource scratchdisk "/EDW/SCR2/" {pools ""}
                        resource scratchdisk "/EDW/SCR1/" {pools "sort" ""}

                }
}
All of your expertise advice are highly appreciated.

Regards
Elavenil
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Despite the fact that you have four nodes, you only have two scratchdisk directories, so there may be some contention for them. Better performance would be had with four scratchdisk directories (on separate file systems unless your disks are part of a SAN). Even better would be eight scratchdisk directories, so you get multiple I/O channels per node.

You did, however, pick up the concept of "round robin" allocation, which will be better than not doing so.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Thanks Ray for your response on this.

Though 4 nodes are configured, all four of them are in the same SAN.

Is it good to configure scratch area in all four nodes? Instead of 2 nodes?

All of your expertise advise is highly appreciated.

Regards
Elavenil
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I guess an important question to ask is:

Are you experiencing BAD performance?

IS your job taking up most of it's time during a SORT?

==========

Just because you "could" get better performance by writing to a seperate physical drive per compute node, doesn't mean that will improve the speed of your job.

Have you exausted all other avenues of speed tweeks?
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Yes. We are facing some slow performance while doing Sort operation in the job.

We are reviewing all avenues to improve the performance. While reviewing the config file, it was identified such config and thought of getting expertise thoughts on this.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

What's the volume of data you are regularly sorting (bytes, not number of records) with a single job? If it's in the multi-gigabyte range, I would suggest changing the Restrict Memory Buffer option in the Sort stage to a higher value. The default is 20MB, which results in 10MB scratch work files. Increase by an order of magnitude (from 20, change to 200-256MB) and see if that helps.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Thanks for your recommendation.

Source file size is 10GB in size and 8 to 8.5 GB is being sorted once initial filter filters 1.5 to 2GB data.

Restrict Memory Buffer option is set to 50MB as of now.

Let me try this option to see the impact in sorting and share the results soon.

Regards
Saravanan
Johnny0638
Participant
Posts: 5
Joined: Tue May 26, 2015 8:27 am

Post by Johnny0638 »

Hi, Could you tell me how to change the Restrict Memory Buffer ?
We run the DS job in osh script , which Environment Variables can change the size of Restrict Memory Buffer ?Thanks!~
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Johnny0638 wrote:We run the DS job in osh script , which Environment Variables can change the size of Restrict Memory Buffer ?
The global memory for tsort operators is set by environment variable APT_TSORT_STRESS_BLOCKSIZE. Note that this sets the memory allocated for all tsort operators.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply