ScratchDisk Configuration

elavenil · Post by **elavenil** » Mon Jul 23, 2012 4:03 am

Hi,

we use the below 4 nodes configuration for our prod environment and curious to understand multinode configuration for Scratch disk (scratch area) will improve performance during SORT operation.

Here is the configuration file.

Code: Select all

{
                node "node1"
                {
                        fastname "prd_edw"
                        pools ""
                        resource disk "/EDW/SG1/" {pools ""}
                        resource scratchdisk "/EDW/SCR1/" {pools "sort" ""}
                        resource scratchdisk "/EDW/SCR2/" {pools ""}
                }
                node "node2"
                {
                        fastname "prd_edw"
                        pools ""
                        resource disk "/EDW/SG2/" {pools ""}
                        resource scratchdisk "/EDW/SCR2/" {pools "sort" ""}
                        resource scratchdisk "/EDW/SCR1/" {pools ""}

                }
               node "node3"
                {
                        fastname "prd_edw"
                        pools ""
                        resource disk "/EDW/SG3/" {pools ""}
                        resource scratchdisk "/EDW/SCR1/" {pools ""}
                        resource scratchdisk "/EDW/SCR2/" {pools "sort" ""}

                }
               node "node4"
                {
                        fastname "prd_edw"
                        pools ""
                        resource disk "/EDW/SG4/" {pools ""}
                        resource scratchdisk "/EDW/SCR2/" {pools ""}
                        resource scratchdisk "/EDW/SCR1/" {pools "sort" ""}

                }
}

All of your expertise advice are highly appreciated.

Regards
Elavenil

ray.wurlod · Post by **ray.wurlod** » Mon Jul 23, 2012 4:27 am

Despite the fact that you have four nodes, you only have two scratchdisk directories, so there may be some contention for them. Better performance would be had with four scratchdisk directories (on separate file systems unless your disks are part of a SAN). Even better would be eight scratchdisk directories, so you get multiple I/O channels per node.

You did, however, pick up the concept of "round robin" allocation, which will be better than not doing so.

elavenil · Post by **elavenil** » Mon Jul 23, 2012 8:24 am

Thanks Ray for your response on this.

Though 4 nodes are configured, all four of them are in the same SAN.

Is it good to configure scratch area in all four nodes? Instead of 2 nodes?

All of your expertise advise is highly appreciated.

Regards
Elavenil

PaulVL · Post by **PaulVL** » Mon Jul 23, 2012 1:00 pm

I guess an important question to ask is:

Are you experiencing BAD performance?

IS your job taking up most of it's time during a SORT?

==========

Just because you "could" get better performance by writing to a seperate physical drive per compute node, doesn't mean that will improve the speed of your job.

Have you exausted all other avenues of speed tweeks?

elavenil · Post by **elavenil** » Mon Jul 23, 2012 8:33 pm

Yes. We are facing some slow performance while doing Sort operation in the job.

We are reviewing all avenues to improve the performance. While reviewing the config file, it was identified such config and thought of getting expertise thoughts on this.

jwiles · Post by **jwiles** » Tue Jul 24, 2012 2:48 pm

What's the volume of data you are regularly sorting (bytes, not number of records) with a single job? If it's in the multi-gigabyte range, I would suggest changing the Restrict Memory Buffer option in the Sort stage to a higher value. The default is 20MB, which results in 10MB scratch work files. Increase by an order of magnitude (from 20, change to 200-256MB) and see if that helps.

Regards,

elavenil · Post by **elavenil** » Tue Jul 24, 2012 8:10 pm

Thanks for your recommendation.

Source file size is 10GB in size and 8 to 8.5 GB is being sorted once initial filter filters 1.5 to 2GB data.

Restrict Memory Buffer option is set to 50MB as of now.

Let me try this option to see the impact in sorting and share the results soon.

Regards
Saravanan

Johnny0638 · Post by **Johnny0638** » Tue Dec 29, 2015 1:31 am

Hi, Could you tell me how to change the Restrict Memory Buffer ?
We run the DS job in osh script , which Environment Variables can change the size of Restrict Memory Buffer ?Thanks!~

ray.wurlod · Post by **ray.wurlod** » Fri Jan 01, 2016 1:21 am

Johnny0638 wrote:We run the DS job in osh script , which Environment Variables can change the size of Restrict Memory Buffer ?

The global memory for tsort operators is set by environment variable APT_TSORT_STRESS_BLOCKSIZE. Note that this sets the memory allocated for all tsort operators.