Page 1 of 1

Performance improvement for reading a file via FTP Plug-in

Posted: Mon Nov 19, 2012 3:30 am
by sumesh.abraham
Hello,

I am using FTP Plug-in stage to read a file from Mainframes server into the Data Stage server. The job has a FTP Plug-in stage and then in teh transformer stage, I am mapping each column using substrings function.
There are 1.4 million records in the file and it takes 70 minutes to complete the FTP. It looks like the contention is at FTP.

Since all the existing jobs use FTP Plug-in, I would preferably avoid using FTP script in place of FTP Plug-in. What are the performance improvement techniques I can use to tune the existing job? Are there any options for FTP Plug-in that can improve the read performance.

Appreciate your inputs!

Posted: Mon Nov 19, 2012 4:31 am
by ArndW
I would first ensure that the FTP is indeed the bottleneck in this case. If you make a copy of your job and go straight from your FTP stage to a peek does the job still take 70 minutes?

Posted: Mon Nov 19, 2012 7:53 am
by sumesh.abraham
I ran the job in both ways
1. Have the job FTP the file and write to peek stage. In this case the rows processed per second was 10148.

2. Have the job FTP file, pass to transformer stage where the columns are mapped through substrings function and write to a dataset. Here the rows processed per second was 10096.

Both jobs took almost the same time to complete - 24 minutes.

Job is running on a 4 node configuration. Please advise what are the tuning steps that can be done to reduce the time taken fort FTP. Thanks!

Posted: Mon Nov 19, 2012 7:56 am
by ArndW
You've now established that the bottleneck is FTP and not the subsequent processing.

How long does it take to FTP the file from the command line?

Posted: Mon Nov 19, 2012 9:08 am
by sumesh.abraham
Interestingly it took 45 minutes when I FTP's the file from command line.
This indicates it is not an issue at Data Stage level and need to be fixed at FTP server level. Any thoughts?

Posted: Mon Nov 19, 2012 9:58 am
by chulett
Talk to your Admins.

Posted: Mon Nov 19, 2012 10:58 am
by ArndW
Since the FTP protocol is a pretty simple one, unless there are huge amounts of transmission errors, you probably won't be able to do much.

Normally the correct approach is to compress the file prior to transmission, but then the DataStage FTP stage isn't the correct one to use due to the binary nature and you would do an external FTP from the command line and just read in the compressed or uncompressed flat file for processing. Compression will most likely significantly reduce your transmission time.

Posted: Mon Nov 19, 2012 11:07 am
by chulett
Could be, especially if there is a lot of "white space" in the file. You'd have to factor in the time to compress and then uncompress the file but certainly worth a shot.

Posted: Mon Nov 19, 2012 1:14 pm
by FranklinE
I vote for blaming the hardware, myself. I use FTP for all of my jobs' first stage (links to transformers before file output), and I routinely see performance of 30,000 rows per second or greater.

I wonder, too, if there could be a performance difference between the plug-in and FTP Enterprise, which is the stage I use. Anyone with experience with both?