Unable to generate a node map

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Unable to generate a node map

Post by VCInDSX »

Hi,
I have a simple job that reads wildcard pattern based files from a folder and loads into a target database. I have the File Name Column property set to store the file name in the target database. This works fine in a Windows Datastage Server environment; of course with help from the gurus here viewtopic.php?t=117069&highlight=APT_FileImportOperator

I migrated this to a Linux GRID environment and am running into the following error on the Sequential File stage.
main_program: For createFilesetFromPattern(), could not find any available nodes in node pool "".
SF_Input_File: At least one filename or data source must be set in APT_FileImportOperator before use.

This is happening when i set the $APT_IMPORT_PATTERN_USES_FILESET to TRUE.

If I set the $APT_IMPORT_PATTERN_USES_FILESET to FALSE, the job runs fine, but the file names are not fully expanded but are stored as /pathname/*.txt.

I tried providing a prefix tag to the pattern like "Feed*.txt" and it doesn't make any difference either. It just loads it as /pathname/Feed*.txt.

If I do an "ls" using the the folder name and pattern it lists the 2 files i have copied into the source location for testing.

I did a search on this messages and found that it might help to supply the folder name as a job parameter and then specify the pattern separately. Tried that and that did not help either.

At this time, the only means to get this to work is to set the $APT_IMPORT_PATTERN_USES_FILESET to FALSE

Please let me know if any input from me would help you help me further.
Your time and help is greatly appreciated.

Thanks,
-V
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Does your configuration file include a default node pool called ""?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Hi Ray,
Thanks for the followup.
I checked the job log and found that the config file being used has the following entries.

{
node "node1"
{
fastname "ctpcqabdsh01p"
pools ""
resource disk "/nfsgrid/nfsbin/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/nfsgrid/nfsbin/IBM/InformationServer/Server/Scratch" {pools ""}
}
node "node2"
{
fastname "ctpcqabdsh01p"
pools ""
resource disk "/nfsgrid/nfsbin/IBM/InformationServer/Server/Datasets" {pools ""}
resource scratchdisk "/nfsgrid/nfsbin/IBM/InformationServer/Server/Scratch" {pools ""}
}
}

Both the nodes have pools ""

Is this what you wanted to be verified Ray?

Let me know if there is any other entry that I should be looking at.

Thanks,
-V
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

That tells me your job is NOT grid enabled. Did you bring in all required grid parameters?
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Hi lstsaur,
Thanks for reviewing my query. The following 4 Grid params are the ones that were suggested to be added to all our PX jobs in the GRID. I have created a parameter set by the name APT_GRID_PARAMS in my project for this purpose.
Here are the values for these entries in the log file.

APT_GRID_PARAMS.$APT_GRID_ENABLE = YES (Compiled-in default)
APT_GRID_PARAMS.$APT_GRID_COMPUTENODES = 1 (Compiled-in default)
APT_GRID_PARAMS.$APT_GRID_PARTITIONS = 1 (Compiled-in default)
APT_GRID_PARAMS.$APT_GRID_SEQFILE_HOST = (Compiled-in default)


Another thing I noticed is that the default config file as printed in the director log in the initial Environment variable settings entry (APT_CONFIG_FILE=/nfsgrid/nfsbin/IBM/InformationServer/Server/Configurations/default.apt) points to "default.apt", which is what i had posted earlier.

A few lines below that i see the following log in director for this job.
<Dynamic_gird.sh> SEQFILE Host(s): ctpcqabdsc02p: ctpcqabdsc02p:
{
node "Conductor"
{
fastname "ctpcqabdsh01p"
pools "conductor"
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
node "node1_1"
{
fastname "ctpcqabdsc02p"
pools ""
resource disk "/nfsdata/data1/datasets" {pools ""}
resource scratchdisk "/scratch" {pools ""}
}
}

I am not fully conversant with grid internals and would appreciate your inputs/directions on how I could decipher this entry.

I have several other PX jobs that work fine with the GRID parameters that i have provided earlier.

Let me know if you need any additional details in this regard.

Thanks for your time,
-V
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Check your sequential file's Properties-->Source-->File; make sure you populated as
File=$APT_GRID_SEQFILE_HOST/pathname/*.txt
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Apologies for the delayed response. Got pulled into a few other unnecessary distractions...

I added the GRID variable for Host files and it still wouldn't give desired results. However, when i added that and enabled the following variable
$APT_IMPORT_PATTERN_USES_FILESET = True, i got another error as follows.

SF_Input: Unable to generate a node map from fileset /tmp/import_tmp_20671db190272.fs.
main_program: Could not check all operators because of previous error(s)


On a separate note, we were asked to use the $APT_GRID_SEQFILE_HOST only for output files and not when reading seq files. Is that not the case?

Thanks,
-V
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

No, you can use $APT_GRID_SEQFILE_HOST for input. It retruns the first host name identified by either Grid engine or from IONODE names.
VCInDSX
Premium Member
Premium Member
Posts: 223
Joined: Fri Apr 13, 2007 10:02 am
Location: US

Post by VCInDSX »

Thanks lstsaur.
One additional question:-
If the source file is on the head node, will the use of the variable $APT_GRID_SEQFILE_HOST interfere with the source file location if it did not get the head node as the Host name during the exeuction?

Does the error message I received point to such a symptom?

Thanks,
-V
Nripendra Chand
Premium Member
Premium Member
Posts: 196
Joined: Tue Nov 23, 2004 11:50 pm
Location: Sydney (Australia)

Post by Nripendra Chand »

I'm also getting the same problem. If I run my job without setting 'APT_IMPORT_PATTERN_USES_FILESET' to 'True', file names come as 'TestFilePattern????????.dat'.
But if I enable this env variable, the job aborts with following error message:
SQ_SrcFile: Unable to generate a node map from fileset /var/tmp/import_tmp_838635bc0dfb.fs.

Our datastage server is on grid env and I've included all the required grid env variables in the job i.e:
$APT_GRID_ENABLE
$APT_GRID_QUEUE
$APT_GRID_SEQFILE_HOST
$APT_GRID_FROM_PARTITIONS
$APT_GRID_FROM_NODES
$APT_GRID_COMPUTENODES
$APT_GRID_PARTITIONS
-Nripendra Chand
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

1. Add another parameter $APT_GRID_HEAD_PARTITIONS = 2 (must be more than 1)
2. Do not use host qualifier for the sequential file . Means remove the $APT_GRID_SEQFILE_HOST preffix from the file path
so it will be File=/pathname/*.txt ; rather than File=$APT_GRID_SEQFILE_HOST/pathname/*.txt
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

What version of the Grid Toolkit you are using? There is no such $APT_GRID_HEAD_PATITIONS Toolkit variable. Besides why you want to partition the head node since all PX engines are on the compute nodes.
You don't run any jobs on the head node.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

because it works.
$APT_GRID_HEAD_PATITIONS=2 will generate two condutor node partition not any compute node on head node. the other partition on head node will be used to generate node map.
lstsaur
Participant
Posts: 1139
Joined: Thu Oct 21, 2004 9:59 pm

Post by lstsaur »

Well, as I said before in my Grid environment, there is no such $APT_GRID_HEAD_PARTITIONS variable. Dynamic generated configurations file is generated by a Java progrram. I don't understand what you talked about "the other partition on head node will be used to generate node map".
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

This parameter $APT_GRID_HEAD_PATITIONS is available in version 3.3.2 of grid toolkit.
Post Reply