Best place to keep ETL data files in Linux?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
gsbrown
Premium Member
Premium Member
Posts: 148
Joined: Mon Sep 23, 2002 1:00 pm
Location: USA

Best place to keep ETL data files in Linux?

Post by gsbrown »

We're in the process of migrating from a Windows platform 7.5.1a version of DataStage to a Linux platform Information Server 8.5

The biggest change, next to 7.5 to 8.5, is the Windows to Linux migration. Normally, all of our sequential file data is stored in it's own folder on a D:\ partitioned drive on the Windows server. What's the best practice for working file storage in the Linux environment? Trying to determine the best folder location so that it doesn't "step" on anything else and can be accessed by all developers. I've seen several suggestions like "/etc" "/usr/share" "/shared/home" but would like to know from you guys in your experience where is the best location.

New to the Linux world! Thanks for your help
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Anywhere that (a) has enough space and (b) is not on the root file system. Avoid /tmp too.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Do something similar to what you are doing, a dedicated separate partition for your data files would be typical. So as noted, not something off the root filesystem and not where DataStage is installed... someplace where filling it up would not be catastrophic.
-craig

"You can never have too many knives" -- Logan Nine Fingers
gsbrown
Premium Member
Premium Member
Posts: 148
Joined: Mon Sep 23, 2002 1:00 pm
Location: USA

Post by gsbrown »

Thanks, that helps a alot!

We have a high availability environment where the applications and data will be stored on a SAN. There's plenty of space there to hold the IIS installation, other 3rd party apps, and still leave room for data files. I like the idea of keeping the data files local to DataStage and avoid venturing out to an outside remote location.

We'll stay out of the /root and /tmp directories and create a new unique folder.
cdp
Premium Member
Premium Member
Posts: 113
Joined: Tue Dec 15, 2009 9:28 pm
Location: New Zealand

Post by cdp »

If you have a look at the InfoSphere DataStage Parallel Framework Standard Practices redbook available here http://www.redbooks.ibm.com/abstracts/s ... .html?Open there's quite a good section on suggested practice around project setup and staging file storage.

Thanks...Jonathan
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Specifically meant a new partition or mount point rather than simply a folder, they are more akin to your existing (I assume) separate drive letter. Data files will need to be 'local' to the ETL server to be accessable but make sure you understood my point about not using the same area that DataStage is installed in... accidently filling that up can seriously damage your projects.
-craig

"You can never have too many knives" -- Logan Nine Fingers
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

I'd also ensure that it's not on the same mount where your "Project" is located.

We have 4 mounts.

1) Tool binaries.
2) Projects
3) project workspace (where all transiant data gets put).
4) Backups
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Which is what I meant by 'DataStage' - Engine, Projects, etc.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply