Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.
Moderators: chulett , rschirm , roy
evans036
Premium Member
Posts: 72 Joined: Tue Jan 31, 2006 11:13 pm
Post
by evans036 » Sun Oct 07, 2007 2:09 pm
we have a prod job (very big & ugly) that generated an error similar to the one below.
in an effort to understand just how lookups work, i created a tiny job that does nothing more than build a big lookup (about 3GB).
It generates this message when i run it:
Code: Select all
Lookup_File_Set_1,0: Could not map table file "/dwhome/Ascential/DataStage/Datasets/lookuptable.20071007.j1tezqd (size 2997004496 bytes)": Cannot allocate memory [keylookup/keylookup.C:707]
Error finalizing / saving table /dwhome/test/app_data/temp/steveE/bigLookupLS [lookuptable/lookuptable.C:633]
i watch unix shared memory, all disk mount points & process memory as this job runs and see no evidence of memory usage.
does anyone know what memory this message is referring to?
thanks in advance,
steve
ray.wurlod
Participant
Posts: 54607 Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:
Post
by ray.wurlod » Sun Oct 07, 2007 3:21 pm
It's not shared memory, it's private memory of the player process executing the LUT_CreateOp operator. The same memory will be used subsequently by the LUT_ProcessOp operator. (These form a composite operator generated by the Lookup stage - you can see it in the score. They may be combined into the one player process if operator combination is enabled.)
You can set an environment variable to cause each player process to log its process ID. Capture the UNIX performance statistics at frequent intervals into a file, then play back the file filtering for the process(es) in question.
Also monitor use of scratch disk while this job is running. Just as a test, try creating another configuration file containing more (much more?) scratch disk, and using that configuration file to run the job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
evans036
Premium Member
Posts: 72 Joined: Tue Jan 31, 2006 11:13 pm
Post
by evans036 » Sun Oct 07, 2007 8:24 pm
thanks for the reponse.
scratch pool has about 300GB
watching process memory of LUT_CreateOp using command:
pmap -x
all i see is 45MB of memory used by that process throughout job execution.
maybe i'm looking in the wrong place?
here is 'ps' output:
Code: Select all
1 R evans036 27205 27202 99 85 0 - 11279 - 22:09 ? 00:01:21 parallel APT_LUTCreateOp in Lookup_File_Set_1
here is pmap output (which remained constant):
Code: Select all
evans036@phl-dwetl04:/dwhome/tws> pmap -x 27205
27205: parallel APT_LUTCreateOp in Lookup_File_Set_1
Address Kbytes RSS Anon Locked Mode Mapping
08048000 100 - - - r-x-- osh
08061000 16 - - - rw--- osh
08065000 1948 - - - rwx-- [ anon ]
40000000 88 - - - r-x-- ld-2.3.3.so
40016000 4 - - - rw--- ld-2.3.3.so
40017000 4 - - - rw--- [ anon ]
40018000 976 - - - r-x-- liborchosli686.so
4010c000 124 - - - rw--- liborchosli686.so
4012b000 4 - - - rw--- [ anon ]
4012c000 8368 - - - r-x-- liborchi686.so
40958000 1244 - - - rw--- liborchi686.so
40a8f000 112 - - - rw--- [ anon ]
40aab000 48 - - - r-x-- liborchmonitori686.so
40ab7000 8 - - - rw--- liborchmonitori686.so
40ab9000 876 - - - r-x-- liborchcorei686.so
40b94000 140 - - - rw--- liborchcorei686.so
40bb7000 4 - - - rw--- [ anon ]
40bb8000 372 - - - r-x-- librwtool.so
40c15000 84 - - - rw--- librwtool.so
40c2a000 52 - - - r-x-- libpthread.so.0
40c37000 4 - - - rw--- libpthread.so.0
40c38000 12 - - - rw--- [ anon ]
40c3b000 8 - - - r-x-- libdl.so.2
40c3d000 4 - - - rw--- libdl.so.2
40c3e000 648 - - - r-x-- libstdc++.so.5.0.6
40ce0000 88 - - - rw--- libstdc++.so.5.0.6
40cf6000 20 - - - rw--- [ anon ]
40cfb000 132 - - - r-x-- libm.so.6
40d1c000 4 - - - rw--- libm.so.6
40d1d000 28 - - - r-x-- libgcc_s.so.1
40d24000 4 - - - rw--- libgcc_s.so.1
40d25000 1084 - - - r-x-- libc.so.6
40e34000 36 - - - rw--- libc.so.6
40e3d000 8 - - - rw--- [ anon ]
40e3f000 560 - - - r-x-- libicuuc.so.22.0
40ecb000 20 - - - rw--- libicuuc.so.22.0
40ed0000 8 - - - rw--- [ anon ]
40ed2000 756 - - - r-x-- libicui18n.so.22.0
40f8f000 16 - - - rw--- libicui18n.so.22.0
40f93000 52 - - - r-x-- libustdio.so.22.0
40fa0000 8 - - - rw--- libustdio.so.22.0
40fa2000 11544 - - - r-x-- libicudata.so.22.0
41ae8000 4 - - - rw--- libicudata.so.22.0
41ae9000 8 - - - rw--- [ anon ]
41aeb000 32 - - - r-x-- libnss_files.so.2
41af3000 4 - - - rw--- libnss_files.so.2
41af4000 12 - - - r-x-- liborchio64i686.so
41af7000 4 - - - rw--- liborchio64i686.so
41b15000 132 - - - rw--- [ anon ]
41b37000 2324 - - - r-x-- liborchgenerali686.so
41d7c000 296 - - - rw--- liborchgenerali686.so
41dc6000 24 - - - rw--- [ anon ]
41dcc000 492 - - - r-x-- liborchsorti686.so
41e47000 60 - - - rw--- liborchsorti686.so
41e56000 4 - - - rw--- [ anon ]
41e57000 1560 - - - r-x-- liborchstatsi686.so
41fdd000 192 - - - rw--- liborchstatsi686.so
4200d000 12 - - - rw--- [ anon ]
42010000 1092 - - - r-x-- liborchbuildopi686.so
42121000 176 - - - rw--- liborchbuildopi686.so
4214d000 64 - - - r-x-- V0S3_BLCreateLUFS_Transformer_3.so
4215d000 8 - - - rw--- V0S3_BLCreateLUFS_Transformer_3.so
4215f000 256 - - - rwxs- apTvS2721027210f99bad14 (deleted)
4219f000 256 - - - rwxs- apTvS2720727207baab7704 (deleted)
421df000 256 - - - rwxs- apTvS272062720699bb0103 (deleted)
4221f000 4 - - - ----- [ anon ]
42220000 8192 - - - rw--- [ anon ]
bfff7000 36 - - - rw--- [ stack ]
ffffe000 4 - - - ----- [ anon ]
-------- ------- ------- ------- -------
total kB 45120 - - -
evans036@phl-dwetl04:/dwhome/tws> pmap -x 27205
evans036@phl-dwetl04:/dwhome/tws>
ArndW
Participant
Posts: 16318 Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:
Post
by ArndW » Sun Oct 07, 2007 10:26 pm
Can you add a call in your before-job to issue a "ulimit -a" and see what the actual runtime values are?
evans036
Premium Member
Posts: 72 Joined: Tue Jan 31, 2006 11:13 pm
Post
by evans036 » Mon Oct 08, 2007 5:54 am
ArndW wrote: Can you add a call in your before-job to issue a "ulimit -a" and see what the actual runtime values are?
excellent idea.
the output yielded this:
Code: Select all
BLCreateLUFS..BeforeJob (ExecSH): Executed command: ulimit -a
*** Output from command was: ***
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 153600
virtual memory (kbytes, -v) unlimited
i believe these values are ok
btw, this is v7.5.2 on Suse Enterprise 9
ideas?
thanks in advance,
steve
ArndW
Participant
Posts: 16318 Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:
Post
by ArndW » Mon Oct 08, 2007 7:33 pm
Those ulimit values look good. Are you using the same dataset in 2 or more lookups in your job?
evans036
Premium Member
Posts: 72 Joined: Tue Jan 31, 2006 11:13 pm
Post
by evans036 » Mon Oct 08, 2007 9:08 pm
ArndW wrote: Those ulimit values look good. Are you using the same dataset in 2 or more lookups in your job?
no. the job is like this:
dataset -> transformer -> lookup
btw, i logged a case with IBM. They are saying it is a shared memory shortage. The lookup dataset is about 3GB and the SHMMAX is set to 1GB so they might be right.
they also say that the memory is created by a single request (which in my case is denied). This would account for why i dont see any memory usage.
I have upped the shmmax to 5GB (we have 18GB on the box) and reran the job. but it failed with the same error.
i did not reboot the linux box (used sysctl -p to instantiate the new kernal parms). but maybe a do need to.
i will keep you posted.
any other ideas welcome
thanks,
steve
ArndW
Participant
Posts: 16318 Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:
Post
by ArndW » Mon Oct 08, 2007 9:52 pm
I think you should see if the error remains after rebooting. It probably will go the way of the dodo bird.
sree_blr
Participant
Posts: 4 Joined: Tue Oct 09, 2007 10:55 am
Post
by sree_blr » Mon Oct 15, 2007 5:39 am
evans036 wrote: we have a prod job (very big & ugly) that generated an error similar to the one below.
in an effort to understand just how lookups work, i created a tiny job that does nothing more than build a big lookup (about 3GB).
It generates this message when i run it:
Code: Select all
Lookup_File_Set_1,0: Could not map table file "/dwhome/Ascential/DataStage/Datasets/lookuptable.20071007.j1tezqd (size 2997004496 bytes)": Cannot allocate memory [keylookup/keylookup.C:707]
Error finalizing / saving table /dwhome/test/app_data/temp/steveE/bigLookupLS [lookuptable/lookuptable.C:633]
i watch unix shared memory, all disk mount points & process memory as this job runs and see no evidence of memory usage.
does anyone know what memory this message is referring to?
thanks in advance,
steve
are you getting 32 bit limits.try using file less than 2GB .
sree_blr
Participant
Posts: 4 Joined: Tue Oct 09, 2007 10:55 am
Post
by sree_blr » Mon Oct 15, 2007 5:40 am
evans036 wrote: we have a prod job (very big & ugly) that generated an error similar to the one below.
in an effort to understand just how lookups work, i created a tiny job that does nothing more than build a big lookup (about 3GB).
It generates this message when i run it:
Code: Select all
Lookup_File_Set_1,0: Could not map table file "/dwhome/Ascential/DataStage/Datasets/lookuptable.20071007.j1tezqd (size 2997004496 bytes)": Cannot allocate memory [keylookup/keylookup.C:707]
Error finalizing / saving table /dwhome/test/app_data/temp/steveE/bigLookupLS [lookuptable/lookuptable.C:633]
i watch unix shared memory, all disk mount points & process memory as this job runs and see no evidence of memory usage.
does anyone know what memory this message is referring to?
thanks in advance,
steve
are you getting 32 bit limits.try using file less than 2GB .
evans036
Premium Member
Posts: 72 Joined: Tue Jan 31, 2006 11:13 pm
Post
by evans036 » Mon Oct 15, 2007 6:08 am
are you getting 32 bit limits.try using file less than 2GB .
that's a good question.
i did try a file size of about 1.5GB and that did not work. anything under 1GB seems to work. so i am assuming its not a 32 bit limitation.
IBM are saying its shared memory. bumping up the shared memory max & rebooting the machine did not help.
I am awaiting further response from IBM. i will keep you all posted here.
thanks,
steve
sanjay
Premium Member
Posts: 203 Joined: Fri Apr 23, 2004 2:22 am
Post
by sanjay » Mon Oct 15, 2007 7:32 am
steve
we got same error wht we added remove duplicate stage on lookup key after lookup stage and then did lookup . it worked fine afterwards u can try that
Sanjay
evans036 wrote: are you getting 32 bit limits.try using file less than 2GB .
that's a good question.
i did try a file size of about 1.5GB and that did not work. anything under 1GB seems to work. so i am assuming its not a 32 bit limitation.
IBM are saying its shared memory. bumping up the shared memory max & rebooting the machine did not help.
I am awaiting further response from IBM. i will keep you all posted here.
thanks,
steve
evans036
Premium Member
Posts: 72 Joined: Tue Jan 31, 2006 11:13 pm
Post
by evans036 » Mon Oct 15, 2007 7:42 am
sanjay,
this particular lookup is built on a unique key - so in this case it would not help.
btw, are dups not 'allowed' in lookups?
thanks,
steve
sanjay
Premium Member
Posts: 203 Joined: Fri Apr 23, 2004 2:22 am
Post
by sanjay » Mon Oct 15, 2007 8:25 am
steve
are u sure 3gb data are unique because we had duplicate records and we were using dataset for lookup
sanjay
evans036 wrote: sanjay,
this particular lookup is built on a unique key - so in this case it would not help.
btw, are dups not 'allowed' in lookups?
thanks,
steve
sanjay
Premium Member
Posts: 203 Joined: Fri Apr 23, 2004 2:22 am
Post
by sanjay » Mon Oct 15, 2007 8:26 am
steve
are u sure 3gb data are unique because we had duplicate records and we were using dataset for lookup
sanjay
evans036 wrote: sanjay,
this particular lookup is built on a unique key - so in this case it would not help.
btw, are dups not 'allowed' in lookups?
thanks,
steve