dynLUT* files location question

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

dynLUT* files location question

Post by chulett »

Searched for people having the same problem and really can only find similar issues, so I'm hoping someone can answer a specific question or two.

Got a production PX job aborting with the following fatal error:

Code: Select all

luKeys,0: Could not map table file "/etl1/tmp/FEP/lookuptable.20091024.tdjcnic (size 3506947648 bytes)": Invalid argument
Error finalizing / saving table /tmp/dynLUT224456d0339b9
Every other post I've found which mentions 'Error finalizing' specifically says "Not enough space" at the end of the first line, here we get "Invalid argument". I'm wondering if this is yet-another-odd-Linux thing and it really is a space issue, which it seems it could be.

* What exactly is the relationship between the 'lookuptable' file that the config file puts into scratch space and the 'dynLUT' dynamic lookup table file that seems to be created in "/tmp" no matter what?

* Does it write first to /tmp and then in the 'finalize' step move it to scratch? Or is some information written to scratch and some to /tmp with the two files mapped/linked somehow?

* Do we have any control over it's use of /tmp, can that be redirected somehow? I know of the UVTEMP setting, but pretty sure that's strictly Server related and not something PX uses... or does it? :?

This large volume worked in dev but /tmp has 16GB there, on production it is a mere 1GB for some reason and the message makes it look like it is needing over 3GB. And yes, this is the first time that PX is processing a volume of this nature in production. They are looking to see if it can be increased but I'd like some warm fuzzies that it will actually fix the problem. Anyone?

And yes, I've seen all the advice re: join v. lookup but the design is already in production and signed off on in QA, so really looking for options to get the existing design working over there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

For what it's worth, I changed where UVTEMP was pointing and as I suspected it had no effect on this issue. :(
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You will find that the UNIX environment variable TEMP is also a project environment variable - you might try changing that value.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Found a TMPDIR where the project default value is empty. I'll try changing that to point to our larger tmp area...
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Crapola, while that change did cause the "dynLUT" files to move to the new location, it did not solve the issue. It now aborts with:

Code: Select all

luKeys,0: Could not map table file "/etl1/tmp/FEP/lookuptable.20091024.tdjcnic (size 3506947648 bytes)": Invalid argument 
Error finalizing / saving table /etl1/tmp/dynLUT224456d0339b9
There's gobs of space in "/etl1" compared to "/tmp" so this looks like something other than a space issue to me now. Any ideas on what else it might be? :cry:
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!: OK, here's another question for anyone that is playing along in our home audience.

The dev server where this works is running the 64bit edition of Linux, whereas the production server where this doesn't work is running the 32bit edition. Coincidence? Of no consequence? Smoking gun?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Could this be similar to the "not enough space" and remapping the object using "ldedit" (which I am not sure exists on your LINUX)?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I was wondering the same thing. Haven't found an ldedit on the system yet, perhaps it's something optional we can download.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Is large file support enabled? This file is 3GB.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm *told* that it is and that they've allegedly created files larger than that in the past. Any idea how I could verify that short of me trying to create a big ass file there?

OK, I can find existing files > 2GB but not one as big as the size noted in the error message. And, oddly enough, they've all been created by Server jobs, not PX jobs.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Forgot to come back and update this with the resolution. Got this back from IBM Engineering:
Do you have the ability to monitor the system resources while the job is running? I think what we might see is the a process butting up against the 4gb size limit on 32bit OS'es. I suspect if you can run top on this while its running you'll see an osh process but up against this limit just as the job aborts. I see this documented about the lookup stage:

The Lookup Stage: It memory maps files. This means that you MUST have enough system memory available to store the entire contents of the file AND you must have enough disk space on the resource disk (defined in the APT_CONFIG_FILE) to shadow the file in memory.

I think the latter bit what we might be running into, not in that you dont have enough memory on the system, but that to load the whole map into memory hits the limit. I see another case where they issue was worked around by using a join stage instead of lookup, and another that did this:

Changed the job to re-partition both the reference input to the lookup and the primary input to lookup to match on the keys. Because the lookup is running 4 way parallel and because we have explicitly partitioned the data, the lookup will disable memory sharing and the per process memory requirement is reduced on the reference input because of the data distribution. This enabled the job to complete.
The last bit of advice is what we went with, after I had a long chat with the peoples involved about designing jobs not just with the requirements in mind but to also figure the data volumes into the design as well. Pretty obvious to some folks here, I'm sure, but not necessarily to a "nooby" who can string stages together but may not really understand what goes on under the covers or how auto partitioning is not your BFFF. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply