Performance improvement for reading a file via FTP Plug-in
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 153
- Joined: Thu May 11, 2006 1:52 am
- Location: Bangalore
Performance improvement for reading a file via FTP Plug-in
Hello,
I am using FTP Plug-in stage to read a file from Mainframes server into the Data Stage server. The job has a FTP Plug-in stage and then in teh transformer stage, I am mapping each column using substrings function.
There are 1.4 million records in the file and it takes 70 minutes to complete the FTP. It looks like the contention is at FTP.
Since all the existing jobs use FTP Plug-in, I would preferably avoid using FTP script in place of FTP Plug-in. What are the performance improvement techniques I can use to tune the existing job? Are there any options for FTP Plug-in that can improve the read performance.
Appreciate your inputs!
I am using FTP Plug-in stage to read a file from Mainframes server into the Data Stage server. The job has a FTP Plug-in stage and then in teh transformer stage, I am mapping each column using substrings function.
There are 1.4 million records in the file and it takes 70 minutes to complete the FTP. It looks like the contention is at FTP.
Since all the existing jobs use FTP Plug-in, I would preferably avoid using FTP script in place of FTP Plug-in. What are the performance improvement techniques I can use to tune the existing job? Are there any options for FTP Plug-in that can improve the read performance.
Appreciate your inputs!
I would first ensure that the FTP is indeed the bottleneck in this case. If you make a copy of your job and go straight from your FTP stage to a peek does the job still take 70 minutes?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 153
- Joined: Thu May 11, 2006 1:52 am
- Location: Bangalore
I ran the job in both ways
1. Have the job FTP the file and write to peek stage. In this case the rows processed per second was 10148.
2. Have the job FTP file, pass to transformer stage where the columns are mapped through substrings function and write to a dataset. Here the rows processed per second was 10096.
Both jobs took almost the same time to complete - 24 minutes.
Job is running on a 4 node configuration. Please advise what are the tuning steps that can be done to reduce the time taken fort FTP. Thanks!
1. Have the job FTP the file and write to peek stage. In this case the rows processed per second was 10148.
2. Have the job FTP file, pass to transformer stage where the columns are mapped through substrings function and write to a dataset. Here the rows processed per second was 10096.
Both jobs took almost the same time to complete - 24 minutes.
Job is running on a 4 node configuration. Please advise what are the tuning steps that can be done to reduce the time taken fort FTP. Thanks!
You've now established that the bottleneck is FTP and not the subsequent processing.
How long does it take to FTP the file from the command line?
How long does it take to FTP the file from the command line?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 153
- Joined: Thu May 11, 2006 1:52 am
- Location: Bangalore
Since the FTP protocol is a pretty simple one, unless there are huge amounts of transmission errors, you probably won't be able to do much.
Normally the correct approach is to compress the file prior to transmission, but then the DataStage FTP stage isn't the correct one to use due to the binary nature and you would do an external FTP from the command line and just read in the compressed or uncompressed flat file for processing. Compression will most likely significantly reduce your transmission time.
Normally the correct approach is to compress the file prior to transmission, but then the DataStage FTP stage isn't the correct one to use due to the binary nature and you would do an external FTP from the command line and just read in the compressed or uncompressed flat file for processing. Compression will most likely significantly reduce your transmission time.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I vote for blaming the hardware, myself. I use FTP for all of my jobs' first stage (links to transformers before file output), and I routinely see performance of 30,000 rows per second or greater.
I wonder, too, if there could be a performance difference between the plug-in and FTP Enterprise, which is the stage I use. Anyone with experience with both?
I wonder, too, if there could be a performance difference between the plug-in and FTP Enterprise, which is the stage I use. Anyone with experience with both?
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson
Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872