Different results in 8.7 job than 8.1 version job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nikhil_bhasin
Participant
Posts: 50
Joined: Tue Jan 19, 2010 4:14 am

Different results in 8.7 job than 8.1 version job

Post by nikhil_bhasin »

Hi All,

I am facing a unique issue while migrating 8.1 version jobs to 8.7 version. Couple of jobs that have remove duplicate stage with hash partition are displaying difference in results when i compare 8.1 output with 8.7
Scenario is like this:-
i/p
colA,colB,colC,colD
A,B,C,1
A,B,D,2
B,C,D,1

keys for removing duplicates, hash partitioning and sorting (in remove duplicate stage partitioning tab). duplicate to retain=first
colA, colB

Results come like this:-
DS 8.1 job o/p
A,B,C,1
B,C,D,1

DS 8.7 job o/p
A,B,D,2
B,C,D,1

Every time I run both jobs the records get randomly retained (for duplicates only)

Can anyone show some way out of this situation? Would be great help.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

This is very strange behavior. I can't say I've seen that problem on either of the two working 8.7 environments. It sounds like your job is configured correctly. I assume you've insured that the new job has the sorts specified in correct order (descending).

Have you switched to NLS on the new system? Can you subset some of the records in question and output them to a sequential file so you can look at them in a Hex editor? I'm wondering if there are invisible characters in the field that is causing it to sort "higher".
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
nikhil_bhasin
Participant
Posts: 50
Joined: Tue Jan 19, 2010 4:14 am

Post by nikhil_bhasin »

The NLS settings are same for both ASCII_ASCL. And the selection of records to retain keeps on changing every time I run the jobs with same source data. Is there any change in the hash algorithm between the 2 versions.
RPhani
Participant
Posts: 32
Joined: Sun Aug 26, 2012 7:03 am
Location: Hyd

Post by RPhani »

Hi,

DataTypes and Lengths Of duplicate columns?CHAR or Varchar?

I think no differnce in algorithm.
----------------------
Phani
nikhil_bhasin
Participant
Posts: 50
Joined: Tue Jan 19, 2010 4:14 am

Post by nikhil_bhasin »

If you meant the datatypes of key columns then it is integer and date.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

How are you sorting? Best use a sort stage and explicitly specify "Stable Sort = true" to remove the non-deterministic part of your problem.

Since the data

A,B,C,1
A,B,D,2

is only sorted on "A" and "B" the record order when not using a stable sort might be different.
nikhil_bhasin
Participant
Posts: 50
Joined: Tue Jan 19, 2010 4:14 am

Post by nikhil_bhasin »

@ArndW
I am using sort option in the partitioning tab of Remove Duplicate stage itself. I am not much clear about the pros and cons of using stable sort, but will try and post back results
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you sort the following rows that have 4 columns

A,B,C,1
B,C,D,1
A,B,D,2

on the first 2 columns using a non-stable (but faster) sort you might get a result of:

A,B,C,1
A,B,D,2
B,C,D,1

or you might get a result of:

A,B,D,2
A,B,C,1
B,C,D,1

This is due to the way the sort algorithm works internally, as it creates groups and subtrees and it might change the order of the rows for items with duplicate sort keys. Using "stable sort" guarantees that the order of rows for duplicates is identical to the source order, but a stable sort can be a lot slower and less efficient.
Hanumantharao Allada
Participant
Posts: 4
Joined: Thu Aug 15, 2013 12:54 pm
Location: Bangalore

Re: Different results in 8.7 job than 8.1 version job

Post by Hanumantharao Allada »

Hi nikhil_bhasin,

Is this issue resolved ... ?

If not then can you please confirm
1) The no.of nodes that you are using in 8.1 and 8.7 for this job?
2) Is there any range lookup you are using in the job..?
Thanks & Regards

HR
hrdstage@gmail.com
weiyi_will
Participant
Posts: 10
Joined: Sun Aug 11, 2013 10:46 pm
Location: Dalian

Post by weiyi_will »

ArndW wrote:How are you sorting? Best use a sort stage and explicitly specify "Stable Sort = true" to remove the non-deterministic part of your problem.

Since the data

A,B,C,1
A,B,D,2

is only sorted on "A" and "B" the record order when not using a stable sort might be different.
Agree :!:
Post Reply