Performance improvement for reading a file via FTP Plug-in

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sumesh.abraham
Participant
Posts: 153
Joined: Thu May 11, 2006 1:52 am
Location: Bangalore

Performance improvement for reading a file via FTP Plug-in

Post by sumesh.abraham »

Hello,

I am using FTP Plug-in stage to read a file from Mainframes server into the Data Stage server. The job has a FTP Plug-in stage and then in teh transformer stage, I am mapping each column using substrings function.
There are 1.4 million records in the file and it takes 70 minutes to complete the FTP. It looks like the contention is at FTP.

Since all the existing jobs use FTP Plug-in, I would preferably avoid using FTP script in place of FTP Plug-in. What are the performance improvement techniques I can use to tune the existing job? Are there any options for FTP Plug-in that can improve the read performance.

Appreciate your inputs!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I would first ensure that the FTP is indeed the bottleneck in this case. If you make a copy of your job and go straight from your FTP stage to a peek does the job still take 70 minutes?
sumesh.abraham
Participant
Posts: 153
Joined: Thu May 11, 2006 1:52 am
Location: Bangalore

Post by sumesh.abraham »

I ran the job in both ways
1. Have the job FTP the file and write to peek stage. In this case the rows processed per second was 10148.

2. Have the job FTP file, pass to transformer stage where the columns are mapped through substrings function and write to a dataset. Here the rows processed per second was 10096.

Both jobs took almost the same time to complete - 24 minutes.

Job is running on a 4 node configuration. Please advise what are the tuning steps that can be done to reduce the time taken fort FTP. Thanks!
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You've now established that the bottleneck is FTP and not the subsequent processing.

How long does it take to FTP the file from the command line?
sumesh.abraham
Participant
Posts: 153
Joined: Thu May 11, 2006 1:52 am
Location: Bangalore

Post by sumesh.abraham »

Interestingly it took 45 minutes when I FTP's the file from command line.
This indicates it is not an issue at Data Stage level and need to be fixed at FTP server level. Any thoughts?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Talk to your Admins.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Since the FTP protocol is a pretty simple one, unless there are huge amounts of transmission errors, you probably won't be able to do much.

Normally the correct approach is to compress the file prior to transmission, but then the DataStage FTP stage isn't the correct one to use due to the binary nature and you would do an external FTP from the command line and just read in the compressed or uncompressed flat file for processing. Compression will most likely significantly reduce your transmission time.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Could be, especially if there is a lot of "white space" in the file. You'd have to factor in the time to compress and then uncompress the file but certainly worth a shot.
-craig

"You can never have too many knives" -- Logan Nine Fingers
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I vote for blaming the hardware, myself. I use FTP for all of my jobs' first stage (links to transformers before file output), and I routinely see performance of 30,000 rows per second or greater.

I wonder, too, if there could be a performance difference between the plug-in and FTP Enterprise, which is the stage I use. Anyone with experience with both?
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Post Reply