Big lookup: cannot allocate memory error

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Big lookup: cannot allocate memory error

Post by evans036 »

we have a prod job (very big & ugly) that generated an error similar to the one below.

in an effort to understand just how lookups work, i created a tiny job that does nothing more than build a big lookup (about 3GB).

It generates this message when i run it:

Code: Select all

Lookup_File_Set_1,0: Could not map table file "/dwhome/Ascential/DataStage/Datasets/lookuptable.20071007.j1tezqd (size 2997004496 bytes)": Cannot allocate memory [keylookup/keylookup.C:707]
Error finalizing / saving table /dwhome/test/app_data/temp/steveE/bigLookupLS [lookuptable/lookuptable.C:633]
i watch unix shared memory, all disk mount points & process memory as this job runs and see no evidence of memory usage.

does anyone know what memory this message is referring to?

thanks in advance,
steve
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's not shared memory, it's private memory of the player process executing the LUT_CreateOp operator. The same memory will be used subsequently by the LUT_ProcessOp operator. (These form a composite operator generated by the Lookup stage - you can see it in the score. They may be combined into the one player process if operator combination is enabled.)

You can set an environment variable to cause each player process to log its process ID. Capture the UNIX performance statistics at frequent intervals into a file, then play back the file filtering for the process(es) in question.

Also monitor use of scratch disk while this job is running. Just as a test, try creating another configuration file containing more (much more?) scratch disk, and using that configuration file to run the job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Post by evans036 »

thanks for the reponse.

scratch pool has about 300GB

watching process memory of LUT_CreateOp using command:

pmap -x

all i see is 45MB of memory used by that process throughout job execution.

maybe i'm looking in the wrong place?

here is 'ps' output:

Code: Select all

1 R evans036 27205 27202 99  85   0 - 11279 -      22:09 ?        00:01:21 parallel APT_LUTCreateOp in Lookup_File_Set_1       
here is pmap output (which remained constant):

Code: Select all

evans036@phl-dwetl04:/dwhome/tws> pmap -x 27205
27205:   parallel APT_LUTCreateOp in Lookup_File_Set_1                                                                                          
Address   Kbytes     RSS    Anon  Locked Mode   Mapping
08048000     100       -       -       - r-x--  osh
08061000      16       -       -       - rw---  osh
08065000    1948       -       -       - rwx--    [ anon ]
40000000      88       -       -       - r-x--  ld-2.3.3.so
40016000       4       -       -       - rw---  ld-2.3.3.so
40017000       4       -       -       - rw---    [ anon ]
40018000     976       -       -       - r-x--  liborchosli686.so
4010c000     124       -       -       - rw---  liborchosli686.so
4012b000       4       -       -       - rw---    [ anon ]
4012c000    8368       -       -       - r-x--  liborchi686.so
40958000    1244       -       -       - rw---  liborchi686.so
40a8f000     112       -       -       - rw---    [ anon ]
40aab000      48       -       -       - r-x--  liborchmonitori686.so
40ab7000       8       -       -       - rw---  liborchmonitori686.so
40ab9000     876       -       -       - r-x--  liborchcorei686.so
40b94000     140       -       -       - rw---  liborchcorei686.so
40bb7000       4       -       -       - rw---    [ anon ]
40bb8000     372       -       -       - r-x--  librwtool.so
40c15000      84       -       -       - rw---  librwtool.so
40c2a000      52       -       -       - r-x--  libpthread.so.0
40c37000       4       -       -       - rw---  libpthread.so.0
40c38000      12       -       -       - rw---    [ anon ]
40c3b000       8       -       -       - r-x--  libdl.so.2
40c3d000       4       -       -       - rw---  libdl.so.2
40c3e000     648       -       -       - r-x--  libstdc++.so.5.0.6
40ce0000      88       -       -       - rw---  libstdc++.so.5.0.6
40cf6000      20       -       -       - rw---    [ anon ]
40cfb000     132       -       -       - r-x--  libm.so.6
40d1c000       4       -       -       - rw---  libm.so.6
40d1d000      28       -       -       - r-x--  libgcc_s.so.1
40d24000       4       -       -       - rw---  libgcc_s.so.1
40d25000    1084       -       -       - r-x--  libc.so.6
40e34000      36       -       -       - rw---  libc.so.6
40e3d000       8       -       -       - rw---    [ anon ]
40e3f000     560       -       -       - r-x--  libicuuc.so.22.0
40ecb000      20       -       -       - rw---  libicuuc.so.22.0
40ed0000       8       -       -       - rw---    [ anon ]
40ed2000     756       -       -       - r-x--  libicui18n.so.22.0
40f8f000      16       -       -       - rw---  libicui18n.so.22.0
40f93000      52       -       -       - r-x--  libustdio.so.22.0
40fa0000       8       -       -       - rw---  libustdio.so.22.0
40fa2000   11544       -       -       - r-x--  libicudata.so.22.0
41ae8000       4       -       -       - rw---  libicudata.so.22.0
41ae9000       8       -       -       - rw---    [ anon ]
41aeb000      32       -       -       - r-x--  libnss_files.so.2
41af3000       4       -       -       - rw---  libnss_files.so.2
41af4000      12       -       -       - r-x--  liborchio64i686.so
41af7000       4       -       -       - rw---  liborchio64i686.so
41b15000     132       -       -       - rw---    [ anon ]
41b37000    2324       -       -       - r-x--  liborchgenerali686.so
41d7c000     296       -       -       - rw---  liborchgenerali686.so
41dc6000      24       -       -       - rw---    [ anon ]
41dcc000     492       -       -       - r-x--  liborchsorti686.so
41e47000      60       -       -       - rw---  liborchsorti686.so
41e56000       4       -       -       - rw---    [ anon ]
41e57000    1560       -       -       - r-x--  liborchstatsi686.so
41fdd000     192       -       -       - rw---  liborchstatsi686.so
4200d000      12       -       -       - rw---    [ anon ]
42010000    1092       -       -       - r-x--  liborchbuildopi686.so
42121000     176       -       -       - rw---  liborchbuildopi686.so
4214d000      64       -       -       - r-x--  V0S3_BLCreateLUFS_Transformer_3.so
4215d000       8       -       -       - rw---  V0S3_BLCreateLUFS_Transformer_3.so
4215f000     256       -       -       - rwxs-  apTvS2721027210f99bad14 (deleted)
4219f000     256       -       -       - rwxs-  apTvS2720727207baab7704 (deleted)
421df000     256       -       -       - rwxs-  apTvS272062720699bb0103 (deleted)
4221f000       4       -       -       - -----    [ anon ]
42220000    8192       -       -       - rw---    [ anon ]
bfff7000      36       -       -       - rw---    [ stack ]
ffffe000       4       -       -       - -----    [ anon ]
-------- ------- ------- ------- -------
total kB   45120       -       -       -
evans036@phl-dwetl04:/dwhome/tws> pmap -x 27205
evans036@phl-dwetl04:/dwhome/tws> 
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Can you add a call in your before-job to issue a "ulimit -a" and see what the actual runtime values are?
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Post by evans036 »

ArndW wrote:Can you add a call in your before-job to issue a "ulimit -a" and see what the actual runtime values are?
excellent idea.

the output yielded this:

Code: Select all

BLCreateLUFS..BeforeJob (ExecSH): Executed command: ulimit -a
*** Output from command was: ***
core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) unlimited
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) 8192
cpu time             (seconds, -t) unlimited
max user processes            (-u) 153600
virtual memory        (kbytes, -v) unlimited
i believe these values are ok

btw, this is v7.5.2 on Suse Enterprise 9

ideas?

thanks in advance,
steve
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Those ulimit values look good. Are you using the same dataset in 2 or more lookups in your job?
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Post by evans036 »

ArndW wrote:Those ulimit values look good. Are you using the same dataset in 2 or more lookups in your job?
no. the job is like this:

dataset -> transformer -> lookup

btw, i logged a case with IBM. They are saying it is a shared memory shortage. The lookup dataset is about 3GB and the SHMMAX is set to 1GB so they might be right.

they also say that the memory is created by a single request (which in my case is denied). This would account for why i dont see any memory usage.

I have upped the shmmax to 5GB (we have 18GB on the box) and reran the job. but it failed with the same error.

i did not reboot the linux box (used sysctl -p to instantiate the new kernal parms). but maybe a do need to.

i will keep you posted.

any other ideas welcome

thanks,
steve
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I think you should see if the error remains after rebooting. It probably will go the way of the dodo bird.
sree_blr
Participant
Posts: 4
Joined: Tue Oct 09, 2007 10:55 am

Re: Big lookup: cannot allocate memory error

Post by sree_blr »

evans036 wrote:we have a prod job (very big & ugly) that generated an error similar to the one below.

in an effort to understand just how lookups work, i created a tiny job that does nothing more than build a big lookup (about 3GB).

It generates this message when i run it:

Code: Select all

Lookup_File_Set_1,0: Could not map table file "/dwhome/Ascential/DataStage/Datasets/lookuptable.20071007.j1tezqd (size 2997004496 bytes)": Cannot allocate memory [keylookup/keylookup.C:707]
Error finalizing / saving table /dwhome/test/app_data/temp/steveE/bigLookupLS [lookuptable/lookuptable.C:633]
i watch unix shared memory, all disk mount points & process memory as this job runs and see no evidence of memory usage.

does anyone know what memory this message is referring to?

thanks in advance,
steve
are you getting 32 bit limits.try using file less than 2GB .
sree_blr
Participant
Posts: 4
Joined: Tue Oct 09, 2007 10:55 am

Re: Big lookup: cannot allocate memory error

Post by sree_blr »

evans036 wrote:we have a prod job (very big & ugly) that generated an error similar to the one below.

in an effort to understand just how lookups work, i created a tiny job that does nothing more than build a big lookup (about 3GB).

It generates this message when i run it:

Code: Select all

Lookup_File_Set_1,0: Could not map table file "/dwhome/Ascential/DataStage/Datasets/lookuptable.20071007.j1tezqd (size 2997004496 bytes)": Cannot allocate memory [keylookup/keylookup.C:707]
Error finalizing / saving table /dwhome/test/app_data/temp/steveE/bigLookupLS [lookuptable/lookuptable.C:633]
i watch unix shared memory, all disk mount points & process memory as this job runs and see no evidence of memory usage.

does anyone know what memory this message is referring to?

thanks in advance,
steve
are you getting 32 bit limits.try using file less than 2GB .
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Post by evans036 »

are you getting 32 bit limits.try using file less than 2GB .
that's a good question.

i did try a file size of about 1.5GB and that did not work. anything under 1GB seems to work. so i am assuming its not a 32 bit limitation.

IBM are saying its shared memory. bumping up the shared memory max & rebooting the machine did not help.

I am awaiting further response from IBM. i will keep you all posted here.

thanks,
steve
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Post by sanjay »

steve

we got same error wht we added remove duplicate stage on lookup key after lookup stage and then did lookup . it worked fine afterwards u can try that

Sanjay

evans036 wrote:
are you getting 32 bit limits.try using file less than 2GB .
that's a good question.

i did try a file size of about 1.5GB and that did not work. anything under 1GB seems to work. so i am assuming its not a 32 bit limitation.

IBM are saying its shared memory. bumping up the shared memory max & rebooting the machine did not help.

I am awaiting further response from IBM. i will keep you all posted here.

thanks,
steve
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Post by evans036 »

sanjay,

this particular lookup is built on a unique key - so in this case it would not help.

btw, are dups not 'allowed' in lookups?

thanks,

steve
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Post by sanjay »

steve

are u sure 3gb data are unique because we had duplicate records and we were using dataset for lookup

sanjay
evans036 wrote:sanjay,

this particular lookup is built on a unique key - so in this case it would not help.

btw, are dups not 'allowed' in lookups?

thanks,

steve
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Post by sanjay »

steve

are u sure 3gb data are unique because we had duplicate records and we were using dataset for lookup

sanjay
evans036 wrote:sanjay,

this particular lookup is built on a unique key - so in this case it would not help.

btw, are dups not 'allowed' in lookups?

thanks,

steve
Post Reply