Page 1 of 1

Best place to keep ETL data files in Linux?

Posted: Tue Mar 22, 2011 12:36 pm
by gsbrown
We're in the process of migrating from a Windows platform 7.5.1a version of DataStage to a Linux platform Information Server 8.5

The biggest change, next to 7.5 to 8.5, is the Windows to Linux migration. Normally, all of our sequential file data is stored in it's own folder on a D:\ partitioned drive on the Windows server. What's the best practice for working file storage in the Linux environment? Trying to determine the best folder location so that it doesn't "step" on anything else and can be accessed by all developers. I've seen several suggestions like "/etc" "/usr/share" "/shared/home" but would like to know from you guys in your experience where is the best location.

New to the Linux world! Thanks for your help

Posted: Tue Mar 22, 2011 1:13 pm
by ray.wurlod
Anywhere that (a) has enough space and (b) is not on the root file system. Avoid /tmp too.

Posted: Tue Mar 22, 2011 1:39 pm
by chulett
Do something similar to what you are doing, a dedicated separate partition for your data files would be typical. So as noted, not something off the root filesystem and not where DataStage is installed... someplace where filling it up would not be catastrophic.

Posted: Tue Mar 22, 2011 1:51 pm
by gsbrown
Thanks, that helps a alot!

We have a high availability environment where the applications and data will be stored on a SAN. There's plenty of space there to hold the IIS installation, other 3rd party apps, and still leave room for data files. I like the idea of keeping the data files local to DataStage and avoid venturing out to an outside remote location.

We'll stay out of the /root and /tmp directories and create a new unique folder.

Posted: Tue Mar 22, 2011 2:10 pm
by cdp
If you have a look at the InfoSphere DataStage Parallel Framework Standard Practices redbook available here http://www.redbooks.ibm.com/abstracts/s ... .html?Open there's quite a good section on suggested practice around project setup and staging file storage.

Thanks...Jonathan

Posted: Tue Mar 22, 2011 3:49 pm
by chulett
Specifically meant a new partition or mount point rather than simply a folder, they are more akin to your existing (I assume) separate drive letter. Data files will need to be 'local' to the ETL server to be accessable but make sure you understood my point about not using the same area that DataStage is installed in... accidently filling that up can seriously damage your projects.

Posted: Wed Mar 23, 2011 8:18 am
by PaulVL
I'd also ensure that it's not on the same mount where your "Project" is located.

We have 4 mounts.

1) Tool binaries.
2) Projects
3) project workspace (where all transiant data gets put).
4) Backups

Posted: Wed Mar 23, 2011 9:16 am
by chulett
Which is what I meant by 'DataStage' - Engine, Projects, etc.