Best way to Read N number of tables and load into 1 table

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Best way to Read N number of tables and load into 1 table

Post by SURA »

Hi there

I am developing a Generic Datastage job to load the 100 tables data into a single table.

I have a control table in which I have the list of tables which need to pick for the day. But in general I may get around 100 tables daily and the total records volume is @ 2 million.

I wish to know the best approach to do this work.

I am in the plan to do a generic RCP job and herewith my design.

SQL Server --> TFM --> SEQ File stage

This job will be a multiple instance job and LOOP Sequence will trigger this job on each iteration and load into sequential file.

Final job will read all the Sequential files using file pattern and load it to the target table.

Is there is any other best approach which will get more performance than this approach?

Suggestions welcome.

DS User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

First and obvious question - do all of these tables and the target table have the same record structure?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

All the source tables are having the same record structure. Whereas in target adding metadata colums too for internal track.

DS User
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Since that's the case, you might want to consider an alternative to the looping construct - perhaps you can take your table list and construct a single source query where 'union all' is used. For example:

SELECT * FROM TABLE_A
UNION ALL
SELECT * FROM TABLE_B
UNION ALL
SELECT * FROM TABLE_C

Etc, etc. Write that out to a file and then have the job leverage the 'SQL File' option, get it all done in "one" select. Assuming they can all be sourced from the same credentials / connection, that is.
-craig

"You can never have too many knives" -- Logan Nine Fingers
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Hi Criag

Thanks for your reply.

Day to day table names will change and no of tables will vary. Today i may pick 50 and tomorrow 500 and then 100.

How to construct select for these scenario?

DS User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Edit the file of names to build SELECT statements around them. Or:

Execute Command activity:

Code: Select all

cat #filepath#
User Variables activity:

Code: Select all

Ereplace((Cats(Reuse("SELECT * FROM "), Command.$Output),@FM," UNION ALL ",-1,0)
That will get you close. You may have to trim the trailing @FM from Command.$Output before (or as part of) constructing the SQL.
Last edited by ray.wurlod on Tue Nov 08, 2011 10:00 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Seems to me you would just select the list of table names and then write them out to a flat file. Nothing in what I posted requires you know anything about the table names or how many there will be on any given day as you are (in essence) building dynamic sql. Write out the first record with just the select and then add on the "union all" to all of the rest.

However, you may run into a problem with the size of the file if your number of tables can go that high. But then I wonder if in a Sequence job the string which holds the "delimited list of things to loop through" can be that long. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Ray

How to execute this query?

The aim of my question is, if query is bigger, I can't pass it as a param.

DS User
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Sorry Criag,

I am not clear with your comments. I have a control table, in which they will give me the list of table name which need to pick for the day.

DS User
Aruna Gutti
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 21, 2007 9:35 am
Location: Boston

Post by Aruna Gutti »

How about passing the table name as a parameter within a sequence job?

1. job 1 -- Read control table and create delimited seq file with all the tables to be loaded.
2. begin the loop
3. parse file created in step 1 for (next) source table name with User variable activity stage
4. Using the source table name passed by user variable activity stage load target table
5. end loop

Aruna.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The point we've been trying to make is that a sequence job with a loop means that you process one table at a time. A UNION can (theoretically anyway) stream all rows from all tables at the same time.

You could construct the SQL statement within the sequence (Routine activity) rather than in an external command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

I got it. Let me try it and get back to you.

Meanwhile , just for a clarification,

For some reason if it has to run 800 tables, still it doesn't matter with union all approach?


DS User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Provided your database server can cope with an SQL statement that long, it should be OK.

You could always create a hybrid approach, such as a sequence with multiple loops each handling a subset of lines from the list of table names.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Pretty simple to test your "800 table" question, simply put the same table name in your control table 800 times.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply