Performance bench marks on External source stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavan5035
Participant
Posts: 4
Joined: Fri Jun 12, 2009 4:43 am
Location: HYD

Performance bench marks on External source stage

Post by pavan5035 »

Hello,

We have done some performance bench marks on external source stage.

Process1:
Seq1:

Execute command activity(Perl script writing to a file)--> Job activity

Job:

Sequential file --> Peek stage

Process2:

External source stage--> peek.

Command in External source stage:
*:#$USR_DR_SCRIPT#/txl_preprocessor.pl #$USR_DR_PREPROC_FROM# #$USR_DR_PREPROC_TO#

Perl script in Process1 and process2 are same.

Perl script takes pre requite set of files like 10000 files cleans hex values starting of the record and STDOUT the data.In process1 we are redirecting it to one single file and reading it from sequentuial file stage.
In process 2 we are running the perl command from external source stage and streaming the STDOUT directly into datastage.

Test results are contradicting to general hard and fast rule that writing to a file and reading from a file is a costly operation than reading directly from STDOUT.Here are the test results.

STREAMING Process 2
Files per Batch Start Time End Time Elapsed Time(minutes) # of records

20000 15:16:38 15:20:18 3:40 13028722
15:28:45 15:32:16 3:31 13028722
15:33:03 15:36:33 3:30 13028722
16:05:21 16:08:54 3:33 13028722
16:09:48 16:13:19 3:31 13028722


WRITING TO FILE Process 1
Files per Batch Start Time End Time Elapsed Time(minutes) # of records

15:23:00 15:26:10 3:10 13028722
15:48:35 15:51:39 3:04 13028722
15:52:51 15:55:56 3:05 13028722
15:57:03 16:00:09 3:06 13028722
16:00:44 16:03:49 3:05 13028722

Both the process are runng in a 2 node confugeration.External source stage is made to run in one node because at the back end it is running two instances of perl script and duplicating the same data.

Server details where tests are conducted:

[wicdsadp@linuxLinux linux5960 2.6.32-431.20.3.el6.x86_64 #1 SMP Fri Jun 6 18:30:54 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
5960 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256724
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 32768
cpu time (seconds, -t) unlimited
max user processes (-u) 20480
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited


Question: Any technical reason why writing to a file and reading from a file is faster than reading from STDOUT?I am ready to provide additional information.Can I do any tunning in external source stage.

Thank You!
Pavan
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Performance bench marks on External source stage

Post by chulett »

pavan5035 wrote:Test results are contradicting to general hard and fast rule that writing to a file and reading from a file is a costly operation than reading directly from STDOUT.
It's an interesting discussion and test but to be honest I've never heard of this "general hard and fast rule". For whatever that is worth. :wink:

Oh, and welcome.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pavan5035
Participant
Posts: 4
Joined: Fri Jun 12, 2009 4:43 am
Location: HYD

@Chullet

Post by pavan5035 »

So is it correct to correct my assumption that out of two individual processes one writing to a file and other reading from that file is faster than one passing the data to the other through mermory.I was under an impression that writing disk I/O is always costlier greater than processing the data in memory among processes.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

You are anyways reading a file in perl script and sending data to stdout, it seems. Correct me if the assumption is wrong. so that requires the same read with one additional step. even if perl is generating the record on fly, it requires additional processing.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

This all sounds like a comparison that doesn't really make sense. There are far too many variables. Certainly, we always try to avoid disk I/O when we can, but how the data is written, what is in it, how big is it, whether it is encrypted or not, record lengths, memory passing strategies, what program is doing the writing, how much time you actually have for the movement, etc. etc. etc. and much much more are going to come into play. There isn't a "general rule" you can apply here. Take a big picture look at the Job and see if the overall task is being approached in the right way for the best performance. ...and then you can decide where you can make the most gains if it isn't moving data as fast as you would like, or if you even need to.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Emphasis mine:
eostic wrote:There are far too many variables.
If there was anything that I wanted to add to my previous post, this was it. And Ernie's full post details it nicely.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pavan5035
Participant
Posts: 4
Joined: Fri Jun 12, 2009 4:43 am
Location: HYD

Post by pavan5035 »

I am sorry I am unable to see the the premium post.Can it be made free for one time.

Thanks in advance!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

How many more "for one time" requests will you make? Get a premium membership; this is the mechanism that keeps DSXchange alive (funded).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The subject of your Premium Membership was moved here and any further conversation on that topic needs to happen there. Let's leave this one for your External Source performance topic.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply