how many tables can be loaded using single job

sathyak098 · Post by **sathyak098** » Thu Jun 13, 2013 4:17 am

Hello
Just need an idea on how many tables can be loaded using single job.
- Tables are independent
- Apprx. each table may receive records in thousands per run
- Only insert
In my case, I have source data which is of diff types which needs to be loaded in more than 10 tables based on type.

chulett · Post by **chulett** » Thu Jun 13, 2013 7:13 am

There's no DataStage limit.

greggknight · Post by **greggknight** » Thu Jun 13, 2013 10:16 am

Well as previously stated Datastage has no limit. That being said, I have never nor would I ever load x amount of tables within the same job unless it was a 0ne time load and even then it would have to be adventagious to me..

For instance we load 180 * 2 (two diff. Companys) use diff. Instance Ids. )dimensions every morning.Thats not counting all the facts but they are in a different batch. To do that I have 180 LOAD jobs. These LOAD jobs are run from a batch job. I can set a parameter to run as many of these jobs in parellel as my system will allow with its resources. Scaleability. Within this process If 1 of the load jobs fail the remaining 179 jobs will still run and then and only then will the batch fail. Then during my restart I need only run the 1 job that failed. The one thing to bear in mind is the simpler the job the better you are. From a debugging and testing stand point as well as a running standpoint. Your job will only run as fast as your slowest object.

(example, we are running 8261 job instances every night in our data warehouse build. )

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Thu Jun 13, 2013 2:23 pm

Most of the limits you'll encounter are not DataStage related. You'll probably hit limits on database connections as well as your hardware capacity that will limit the number of tables you can process at once.

sathyak098 · Post by **sathyak098** » Fri Jun 14, 2013 3:59 am

I do receive records not more than 10k(sometimes 0 records) per each table per run and table is going to be truncate and load. I need suggestion on which of below option is better.
1. To have all 10 tables load in single job
2. Design 10 individual jobs and run parallely

ArndW · Post by **ArndW** » Fri Jun 14, 2013 5:03 am

It is much better to do 10 tables in one job; and 10 tables is a very small number.

ray.wurlod · Post by **ray.wurlod** » Fri Jun 14, 2013 5:04 am

I would prefer to use separate jobs, even if they are separate invocations of a single generic job design.

ArndW · Post by **ArndW** » Fri Jun 14, 2013 5:22 am

I should have asked what exactly you mean by "loading a table" before responding.

I just finished writing a job this morning which calls up 20 Database stages to execute twenty different SQL internal table copies, and without thinking responded with that job in mind - not a good answer if you are doing datastage processing in your job.

If your tables use different sources and have different processing rules then I would recommend, as does Ray, one target table per job.

If, for example, you had one large source file/table that gets processed and written to your 10 different targets you would save time if you only read your source data once, making a single job more efficient.

sathyak098 · Post by **sathyak098** » Fri Jun 14, 2013 5:47 am

My job reads large file. I have a column from which I identify the type. Based on type, I segregate data in Transformer and taking 10 diff output links to target tables.

shiva_reddys447 · Post by **shiva_reddys447** » Fri Jun 14, 2013 6:50 am

its ideal to use ten different jobs(or single job with multiple invocations) to load 10 different table and you can trigger these 10 jobs parallely in sequence (again you can have load balancing).

This provides more flexibility and control over each table for any future modifications or tracking.

Its not a good idea to open up 10 db connections in a single job

ArndW · Post by **ArndW** » Fri Jun 14, 2013 7:07 am

In this case it would seem that your best approach is to do everything in one job. If your source table can be SELECTed based on that column you could write 10 jobs (or one generic one called 10 times with different parameters) since each job won't read the whole table.

But I would suggest doing one job with 11 connections to a database; that is well within the capabilities of even the smallest DataStage installation.

chulett · Post by **chulett** » Fri Jun 14, 2013 7:17 am

Source file Arnd.

chulett · Post by **chulett** » Fri Jun 14, 2013 7:48 am

Still, I would advocate separate jobs. The file can be read simultaneously and (as noted) an issue with one load won't affect or cause reloads of the other tables.

However, going to disagree on one of shiva's points:

shiva_reddys447 wrote:Its not a good idea to open up 10 db connections in a single job

That would be an opinion. I for one don't see any issue with it, in general. Ultimately that depends entirely on your infrastructure, database capabilities and ETL job load.

ray.wurlod · Post by **ray.wurlod** » Fri Jun 14, 2013 4:10 pm

shiva_reddys447 wrote:Its not a good idea to open up 10 db connections in a single job

Why not?

I have a job that joins data from 20 different sources. It runs quite happily.

SURA · Post by **SURA** » Sun Jun 16, 2013 9:41 pm

If i would be in your position, i will create a master job which will do all the transformations for all the target tables and my final targetwill be dataset files. Then have on multiple instance job with RCP to load the data.