DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
bart12872
Participant



Joined: 19 Jan 2007
Posts: 82

Points: 734

Post Posted: Mon Jan 26, 2009 7:45 am Reply with quote    Back to top    

DataStage® Release: 7x
Job Type: Parallel
OS: Unix
Additional info: sort key problem
Hi,

A job cause me a huge grief because of the use of sorts.

I explain me :

In my job, I have a input with a huge numbers of lines ordered by col1.
Then, i make an inner join and an agregation on col1,col2,col3,col4.
So i use a sort stage (key sort col1,col2,col3,col4) with col1 previously sorted.

So, at this moment data as sort by col1,col2,col3,col4

After that, I need to sort by col2,col3,col4 only.

Is there a method to do this without cut the dataflow ?
Do I have to write in a dataset all data and then sort ?

thanks,
martin.
mk_ds09
Participant



Joined: 25 Jan 2009
Posts: 72
Location: Pune
Points: 518

Post Posted: Mon Jan 26, 2009 8:55 am Reply with quote    Back to top    

In order to have better design of the job..

1. There is join stage in the job...It is advised that is you are having huge unsorted data, u can use lookup stage.. ( of course ..here the other link where you are putting the join should have less rows which can fit in your physical memory or performance will degrade again ! )

2.Do not use stable sort which is much more expensive..

3.Use restirct memory clause in sort, which can improve the performance.

you have mentioned that writing the dataset and then sorting..
are you using database stages currently ?

-------
MK
Rate this response:  
Not yet rated
shamshad



Group memberships:
Premium Members

Joined: 25 Aug 2004
Posts: 147
Location: Detroit,MI
Points: 1383

Post Posted: Mon Jan 26, 2009 9:04 am Reply with quote    Back to top    

Martin,

This might not be the answer you looking for but whenever we have to sort and rearrange huge amount of data, we do it via a UNIX script rather than using the ETL Tool.

UNIX does these operation fairly quickly and efficiently and we never had any memory issues etc. The only catch is you will have to add few extra
steps in your Sequence like calling Shell script from Master Sequence etc.

After all no ETL tool is built to handle almost every situation efficiently.

_________________
Datawarehouse Consultant
Rate this response:  
Not yet rated
bart12872
Participant



Joined: 19 Jan 2007
Posts: 82

Points: 734

Post Posted: Mon Jan 26, 2009 9:26 am Reply with quote    Back to top    

mk_ds09 wrote:
In order to have better design of the job..

1. There is join stage in the job...It is advised that is you are having huge unsorted data, u can use lookup stage.. ( of course ..here the other link where you are putting the join should have less rows which can fit in your physical memory or performance will degrade again ! )

2.Do not use stable sort which is much more expensive..

3.Use restirct memory clause in sort, which can improve the performance.

you have mentioned that writing the dataset and then sorting..
are you using database stages currently ?

-------
MK


Thanks for your response.
1-well, I didn't developped key join in my join stage. The key is col1,col2. So as my input are sorted by col1, col2. The dataflow is not broken.
2- I didn't use the stable sort. In fact, I never find a situation with the need of stable sort.
3- I must admit I doesn't consider this parameter. I always let it to 20MB, the default value. Can you tell me me how you define it ?

no, i didn't use database, except to extract data.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54601
Location: Sydney, Australia
Points: 296091

Post Posted: Mon Jan 26, 2009 1:33 pm Reply with quote    Back to top    

In the Sort stage mark the sort mode for Col1 "don't sort, already sorted" and sort normally by Col2, Col3 and Col4.

_________________
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours