Best way to Read N number of tables and load into 1 table
Moderators: chulett, rschirm, roy
Best way to Read N number of tables and load into 1 table
Hi there
I am developing a Generic Datastage job to load the 100 tables data into a single table.
I have a control table in which I have the list of tables which need to pick for the day. But in general I may get around 100 tables daily and the total records volume is @ 2 million.
I wish to know the best approach to do this work.
I am in the plan to do a generic RCP job and herewith my design.
SQL Server --> TFM --> SEQ File stage
This job will be a multiple instance job and LOOP Sequence will trigger this job on each iteration and load into sequential file.
Final job will read all the Sequential files using file pattern and load it to the target table.
Is there is any other best approach which will get more performance than this approach?
Suggestions welcome.
DS User
I am developing a Generic Datastage job to load the 100 tables data into a single table.
I have a control table in which I have the list of tables which need to pick for the day. But in general I may get around 100 tables daily and the total records volume is @ 2 million.
I wish to know the best approach to do this work.
I am in the plan to do a generic RCP job and herewith my design.
SQL Server --> TFM --> SEQ File stage
This job will be a multiple instance job and LOOP Sequence will trigger this job on each iteration and load into sequential file.
Final job will read all the Sequential files using file pattern and load it to the target table.
Is there is any other best approach which will get more performance than this approach?
Suggestions welcome.
DS User
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Since that's the case, you might want to consider an alternative to the looping construct - perhaps you can take your table list and construct a single source query where 'union all' is used. For example:
SELECT * FROM TABLE_A
UNION ALL
SELECT * FROM TABLE_B
UNION ALL
SELECT * FROM TABLE_C
Etc, etc. Write that out to a file and then have the job leverage the 'SQL File' option, get it all done in "one" select. Assuming they can all be sourced from the same credentials / connection, that is.
SELECT * FROM TABLE_A
UNION ALL
SELECT * FROM TABLE_B
UNION ALL
SELECT * FROM TABLE_C
Etc, etc. Write that out to a file and then have the job leverage the 'SQL File' option, get it all done in "one" select. Assuming they can all be sourced from the same credentials / connection, that is.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Edit the file of names to build SELECT statements around them. Or:
Execute Command activity:
User Variables activity:
That will get you close. You may have to trim the trailing @FM from Command.$Output before (or as part of) constructing the SQL.
Execute Command activity:
Code: Select all
cat #filepath#
Code: Select all
Ereplace((Cats(Reuse("SELECT * FROM "), Command.$Output),@FM," UNION ALL ",-1,0)
Last edited by ray.wurlod on Tue Nov 08, 2011 10:00 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Seems to me you would just select the list of table names and then write them out to a flat file. Nothing in what I posted requires you know anything about the table names or how many there will be on any given day as you are (in essence) building dynamic sql. Write out the first record with just the select and then add on the "union all" to all of the rest.
However, you may run into a problem with the size of the file if your number of tables can go that high. But then I wonder if in a Sequence job the string which holds the "delimited list of things to loop through" can be that long.
However, you may run into a problem with the size of the file if your number of tables can go that high. But then I wonder if in a Sequence job the string which holds the "delimited list of things to loop through" can be that long.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 145
- Joined: Fri Sep 21, 2007 9:35 am
- Location: Boston
How about passing the table name as a parameter within a sequence job?
1. job 1 -- Read control table and create delimited seq file with all the tables to be loaded.
2. begin the loop
3. parse file created in step 1 for (next) source table name with User variable activity stage
4. Using the source table name passed by user variable activity stage load target table
5. end loop
Aruna.
1. job 1 -- Read control table and create delimited seq file with all the tables to be loaded.
2. begin the loop
3. parse file created in step 1 for (next) source table name with User variable activity stage
4. Using the source table name passed by user variable activity stage load target table
5. end loop
Aruna.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The point we've been trying to make is that a sequence job with a loop means that you process one table at a time. A UNION can (theoretically anyway) stream all rows from all tables at the same time.
You could construct the SQL statement within the sequence (Routine activity) rather than in an external command.
You could construct the SQL statement within the sequence (Routine activity) rather than in an external command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Provided your database server can cope with an SQL statement that long, it should be OK.
You could always create a hybrid approach, such as a sequence with multiple loops each handling a subset of lines from the list of table names.
You could always create a hybrid approach, such as a sequence with multiple loops each handling a subset of lines from the list of table names.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.