how many tables can be loaded using single job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

sathyak098
Participant
Posts: 23
Joined: Tue May 14, 2013 6:34 am

how many tables can be loaded using single job

Post by sathyak098 »

Hello
Just need an idea on how many tables can be loaded using single job.
- Tables are independent
- Apprx. each table may receive records in thousands per run
- Only insert
In my case, I have source data which is of diff types which needs to be loaded in more than 10 tables based on type.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's no DataStage limit.
-craig

"You can never have too many knives" -- Logan Nine Fingers
greggknight
Premium Member
Premium Member
Posts: 120
Joined: Thu Oct 28, 2004 4:24 pm

Post by greggknight »

Well as previously stated Datastage has no limit. That being said, I have never nor would I ever load x amount of tables within the same job unless it was a 0ne time load and even then it would have to be adventagious to me..

For instance we load 180 * 2 (two diff. Companys) use diff. Instance Ids. )dimensions every morning.Thats not counting all the facts but they are in a different batch. To do that I have 180 LOAD jobs. These LOAD jobs are run from a batch job. I can set a parameter to run as many of these jobs in parellel as my system will allow with its resources. Scaleability. Within this process If 1 of the load jobs fail the remaining 179 jobs will still run and then and only then will the batch fail. Then during my restart I need only run the 1 job that failed. The one thing to bear in mind is the simpler the job the better you are. From a debugging and testing stand point as well as a running standpoint. Your job will only run as fast as your slowest object.

(example, we are running 8261 job instances every night in our data warehouse build. )
"Don't let the bull between you and the fence"

Thanks
Gregg J Knight

"Never Never Never Quit"
Winston Churchill
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Most of the limits you'll encounter are not DataStage related. You'll probably hit limits on database connections as well as your hardware capacity that will limit the number of tables you can process at once.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
sathyak098
Participant
Posts: 23
Joined: Tue May 14, 2013 6:34 am

Post by sathyak098 »

I do receive records not more than 10k(sometimes 0 records) per each table per run and table is going to be truncate and load. I need suggestion on which of below option is better.
1. To have all 10 tables load in single job
2. Design 10 individual jobs and run parallely
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

It is much better to do 10 tables in one job; and 10 tables is a very small number.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I would prefer to use separate jobs, even if they are separate invocations of a single generic job design.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I should have asked what exactly you mean by "loading a table" before responding.

I just finished writing a job this morning which calls up 20 Database stages to execute twenty different SQL internal table copies, and without thinking responded with that job in mind - not a good answer if you are doing datastage processing in your job.

If your tables use different sources and have different processing rules then I would recommend, as does Ray, one target table per job.

If, for example, you had one large source file/table that gets processed and written to your 10 different targets you would save time if you only read your source data once, making a single job more efficient.
sathyak098
Participant
Posts: 23
Joined: Tue May 14, 2013 6:34 am

Post by sathyak098 »

My job reads large file. I have a column from which I identify the type. Based on type, I segregate data in Transformer and taking 10 diff output links to target tables.
shiva_reddys447
Participant
Posts: 21
Joined: Sat Sep 08, 2007 12:04 am
Location: bangalore

Post by shiva_reddys447 »

its ideal to use ten different jobs(or single job with multiple invocations) to load 10 different table and you can trigger these 10 jobs parallely in sequence (again you can have load balancing).

This provides more flexibility and control over each table for any future modifications or tracking.

Its not a good idea to open up 10 db connections in a single job
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

In this case it would seem that your best approach is to do everything in one job. If your source table can be SELECTed based on that column you could write 10 jobs (or one generic one called 10 times with different parameters) since each job won't read the whole table.

But I would suggest doing one job with 11 connections to a database; that is well within the capabilities of even the smallest DataStage installation.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Source file Arnd. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Still, I would advocate separate jobs. The file can be read simultaneously and (as noted) an issue with one load won't affect or cause reloads of the other tables.

However, going to disagree on one of shiva's points:
shiva_reddys447 wrote:Its not a good idea to open up 10 db connections in a single job
That would be an opinion. I for one don't see any issue with it, in general. Ultimately that depends entirely on your infrastructure, database capabilities and ETL job load.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

shiva_reddys447 wrote:Its not a good idea to open up 10 db connections in a single job
Why not?

I have a job that joins data from 20 different sources. It runs quite happily.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

If i would be in your position, i will create a master job which will do all the transformations for all the target tables and my final targetwill be dataset files. Then have on multiple instance job with RCP to load the data.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Post Reply