Job Performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

Job Performance

Post by Nagac »

Hi

I have Datastage Job which does Null Validation, Duplicate Validation and finally loads into Dataset. Which takes around 50MINS to complete(27 million Rows). I would like to improve the performance can some one advise on?

ODBC Stage --> Transformer(Validation) Sort--> Transformer(Duplicate Removal)-->Datastage.

Apart from these, i have validation fail links writes into Datastage.

We have 2 nodes

Below are the environment Variables used in project level:
Environment variable settings:
APT_COMPILEOPT=/D APT_USE_ANSI_IOSTREAMS /D _WIN32 /D _MBCS /nologo /W3 /WX- /Gm- /EHa /MD /GS- /fp:precise /Zc:wchar_t- /Zc:forScope /Gd /TP /Zi /Oy- /c
APT_COMPILER=cl
APT_CONFIG_FILE=E:/IBM/InformationServer/Server/Configurations/default.apt
APT_DEFAULT_TRANSPORT_BLOCK_SIZE=13107200
APT_DISABLE_COMBINATION=1
APT_ERROR_CONFIGURATION=severity, !vseverity, !jobid, moduleid, errorIndex, timestamp, !ipaddr, !nodeplayer, !nodename, opid, message
APT_IO_MAP=1
APT_IO_NOMAP=1
APT_LINKER=link
APT_LINKOPT=/INCREMENTAL:NO /NOLOGO /DLL /DEBUG /SUBSYSTEM:CONSOLE /DYNAMICBASE:NO /MACHINE:X86
APT_MAX_DELIMITED_READ_SIZE=204800
APT_MAX_TRANSPORT_BLOCK_SIZE=104857600
APT_MONITOR_MINTIME=10
APT_NO_IOCOMM_OPTIMIZATION=1
APT_NO_ONE_NODE_COMBINING_OPTIMIZATION=1
APT_OPERATOR_REGISTRY_PATH=K:\DS_Projects\projects\international\INTL_DEV\buildop
APT_ORCHHOME=E:/IBM/InformationServer/Server/PXEngine
APT_PHYSICAL_DATASET_BLOCK_SIZE=9000000
APT_USE_CRLF=1
APT_USE_IPV4=1
DB2INSTANCE=DB2
DS_ENABLE_RESERVED_CHAR_CONVERT=0
DS_OPERATOR_BUILDOP_DIR=buildop
DS_OPERATOR_WRAPPED_DIR=wrapped
DS_OPTIMIZE_FILE_BROWSE=0
DS_PX_RESET=1
DS_TDM_PIPE_OPEN_TIMEOUT=720
DS_TDM_TRACE_SUBROUTINE_CALLS=0
DS_USERNO=-4024
ISUSER=ChalamN
NUTCROOT=C:\PROGRA~2\MKSTOO~1
TMP=C:\Windows\TEMP
UNIVERSE_CONTROLLING_TERM=1
UNIVERSE_PARENT_PROCESS=13308
USER=DSTAGE\dsadm
USERDOMAIN=WORKGROUP
USERNAME=DSTAGE$
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try allocating more memory in the Sort stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Look to improve your SQL on the ODBC entry.

Seek to use a native connector stage instead of ODBC if possible.

A partition read will give you more rows per second on that extract, because you will have more connections to the database.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Since the bottleneck portion of your job has not been identified, it is impossible to give specific tuning advice.

If you make a copy of your job and just have your source ODBC into a PEEK stage and nothing else, what speed does that run at? If the speed is similar to your normal job speed then your bottleneck is your source stage and you can look into tuning that. If the job copy is very fast then tuning ODBC will do nothing, since that isn't the source of your problems.

While the bottleneck is most likely your read speed, you will need to check to make sure that this is indeed true.
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

Post by Nagac »

Thanks Everyone,

I have increased the RMU and Job looks fine.

Arndw: ODBC Stage extracting the 21K/Sec and when it goes to the Sort stage it's come down to 16K/Sec. I need some help on increasing the throughput in Sort Stage.

I tried increasing the Buffer in Sort Stage but couldn't see any difference. I had gone through one Article and its says we achieve some performance by increasing readahead(Unix), Is there any equivalent parameter in windows?

Thanks
Naga
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Where are your temporary and scratch spaces located? If they aren't on local disks you might want to change your configuration to increase speed.
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

Post by Nagac »

Arndw

Both are in Local disk(s)
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

Post by battaliou »

Try sorting in the ODBC stage.
3NF: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key. So help me Codd.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I agree with Martin, if you sort in the database that might be faster than doing so in DataStage and if the database is on another machine then it would transfer CPU usage from the DataStage server to that box.
Post Reply