Key and PK columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
srini.dw
Premium Member
Premium Member
Posts: 186
Joined: Fri Aug 18, 2006 1:59 am
Location: Chennai

Key and PK columns

Post by srini.dw »

Hi,

Have 1 doubt, can anyone please clarify.

Suppose I have 5 columns A,B,C,D,E.

I got 2 jobs

Job 1. SQL Server -> Copy -> DataSet
Job 2. DataSet -> Column_Generator -> Oracle.

A,B are PK columns in source(SQL Server) and column A is Pk column in Target Table(Oracle).

In the copy stage, done Hash partition by A,B and in the DataSet, partition is SAME.
In the second also in Column Generator and in Oracle stage, partition is SAME.

In the Oracle stage, I have used Update then Insert, Where clause for Update I have used

WHERE (B = ORCHESTRATE.B)

We are in the grid environment, I have run couple of times, count is getting match between source and target.

Question: Is the approach correct or any changes has to be made.

Thank You,
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

why did you go for hash partitioning in copy stage?
if A,B are primary keys in source,How can A alone can be primary key in target?

Ex:

Src:
A,B
1,2
1,3

Tgt:
A
1
1
This scenorio will fail at table level because we see dups in column A.I think you need to look at your requirement.
srini.dw
Premium Member
Premium Member
Posts: 186
Joined: Fri Aug 18, 2006 1:59 am
Location: Chennai

Post by srini.dw »

Thanks for the reply.

Hash partitioning in copy stage because based on 2 fields, records would be distributed. Is there any issue with this.

Reg. the duplicate, I have checked, there are no columns like this, my mistake.

I will re-phrase my scenerio.

Job 1. SQL Server -> Copy -> DataSet
Job 2. DataSet -> Column_Generator (C.G) -> LKP -> Modify Stage-> Oracle

Lookup reference is Entire.

Columns names are A,B,C,D,E

Scenario 1

Column A is PK in the source.
Column A is PK in the target

Scenario 2

Column A is PK in the source.
Column B,C are PK in the target


Scenario 3

Column A is PK in the source.
Column A, B, C are PK in the target.


Solution for scenario 1 would be do a hash partition on column A in copy stage and keep SAME in rest of stages.

Solution for scenario 2 would be do a hash partition on column A in copy stage
and keep SAME in DataSet, C.G, Lookup and in Modify stage do a hash partion on columns B and C, Oracle -> SAME

Solution for scenario 3 would be do a hash partition on column A in copy stage
and keep SAME in DataSet, C.G, Lookup and in modify stage do a hash partion on both columns A, B and C, Oracle -> SAME

Is the solutions look good or any changes has to be made.

Appreciate your help on this.

Thank You.
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

I think hash partitioning is not required in any scenario.we do hash partitioning when we want to group similar keys in 1 node so as to apply remove dups.I dont think you have that case.so make the job simple and dont use hash partitioning at all any where in the job.as long as PK is satisfied at target you are fine.it doesnt matter which record gets inserted first.
srini.dw
Premium Member
Premium Member
Posts: 186
Joined: Fri Aug 18, 2006 1:59 am
Location: Chennai

Post by srini.dw »

Thanks for the input.

Will change the partitioning of the jobs to SAME.

Thanks,
Naveen
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

Can you mark it as resolved if it resolved?

Thank You
Post Reply