Page 1 of 1
REG. DATASETS
Posted: Thu Nov 29, 2007 3:35 am
by milandesai82
PROBLEM DESCRIPTION:
I am using datasets in jobs instead of seq. file to save time for reads/writes.
But in the following path two identical files are created which is causing space issue on server "/local/data1/IBM/InformationServer/Server/Datasets".
Posted: Thu Nov 29, 2007 1:56 pm
by ray.wurlod
Welcome aboard.
Spend some time learning about configuration files, in which you specify the location of the Data Set data files.
The pathname you have given is not a file, as you state; it's a directory in which data files are written. It's the default location because it is guaranteed to exist when DataStage server is installed.
Posted: Fri Nov 30, 2007 10:15 am
by milandesai82
ray.wurlod wrote:Welcome aboard.
Spend some time learning about configuration files, in which you specify the location of the Data Set data files.
The pathname you have given is not a file, as you state; it's a ...
Thanks for your response.
Let me clarify, the path that i have given is where my dataset files are created, problem is as follows, DATASTAGE is creating TWO INDENTICAL FILES, actually it should create only one file.
and one more thing i am really happy to have my first reply from you actually i have worked in RELIANCE for 2+ years in DSS and have heard lot abt you.
Posted: Fri Nov 30, 2007 1:57 pm
by ray.wurlod
Are they really identical, or merely the same size? On a two-node configuration file you would expect there to be two data files for a data set (at least for one over 128KB). On a four-node configuration you would expect there to be four data files for the Data Set, and so on. With a round robin partitioning algorithm, you would expect the files to be the same size (plus or minus a block or so).
If the files really are identical then somewhere in your job - perhaps through inappropriate choice of partitioning algorithm - you have generated two copies of your data. This could happen in a Lookup File Set with the partitioning algorithm set to Entire, for example.
Posted: Tue Dec 04, 2007 3:55 am
by milandesai82
ray.wurlod wrote:Are they really identical, or merely the same size? On a two-node configuration file you would expect there to be two data files for a data set (at least for one over 128KB). On a four-node configur ...

thanks.... i am facing one more situation as follows,when i am running my job for the first time it creates fresh dataset files, but when i am running job again it should ideally overwrite the old files but datastage is creating new set of files kindly guide me on this......
Posted: Tue Dec 04, 2007 5:59 am
by ray.wurlod
Are you using the same configuration file? Are you specifying Overwrite or Append? Does the job score include a composite operator that incorporates deletion of the Data Set?