parallel datasets

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PeteM2
Premium Member
Premium Member
Posts: 44
Joined: Thu Dec 15, 2011 9:17 am
Location: uk

parallel datasets

Post by PeteM2 »

Given the scenario of a dataset being defined as 'overwrite' in a job using 2 nodes and each node had a seperate disk resource file system.

If the nodes was increased to 4 nodes with each node having seperate disk resource file systems, would the file be seamlessly re-created across the 4 file systems when the job was run with the new version of the config file. Or would a seperate task be required to spread the dataset across the 4 file systems prior to the job running with the new config file ?
thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Seamless.

But you may need a manual process to clean up the old 2-node data files.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PeteM2
Premium Member
Premium Member
Posts: 44
Joined: Thu Dec 15, 2011 9:17 am
Location: uk

Post by PeteM2 »

I take it that after the job has run with the new config file, the header component will point to the new 4 partitions of data. It will no longer reference the original 2 partitions of data.

Therefore does the dataset utility easily identify these orphaned data partitions?
thanks
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Why do you think the data segments would be orphaned? You have overwritten the dataset, meaning the dataset has been replaced...you haven't just rewritten the descriptor file.

The most common cause of orphaned data segments is users deleting the descriptor files using standard O/S commands (rm, delete, etc.) rather than the appropriate tools (orchadmin, dataset management, dataset stage).

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

"you may need a manual process" the comment does not say that you will need it.

The 2 node dataset will be overwritten and there won't be any orphaned files as the configuration file copy in descriptor will be used to delete/overwrite the data files
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It all depends on whether the original two nodes' resource disk settings are included in the new four node configuration file or not.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PeteM2
Premium Member
Premium Member
Posts: 44
Joined: Thu Dec 15, 2011 9:17 am
Location: uk

Post by PeteM2 »

Is it the case that If the resource disks allocated to the 4 nodes will not be the same as the original disks allocated to the 2 node configuration .

Then there will be orphan data files and the dataset utility can identify these files for deletion?
thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There will be orphaned segment files in this scenario, and the Data Set utility (by which I assume you mean the Data Set Management utility) can not identify them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply