Add column to input

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Add column to input

Post by kumarjit »

Hello All.

I've a source text file with data like below:

Code: Select all

Field1
A
A
B
C
C
D
With this data, I have to generate a temporary dataset, as below:

Code: Select all

Field1 Field2
A      Y 
A      Y
B      N
C      Y
C      Y
D      N
Field2 is populated based on the following logic:
1.If the corresponding Field1 value has duplicates, the Y
2. Else N

The way I know to do this is(maybe a bit crude)

Code: Select all

1. Create a parallel job as the one below
             Seq file2
               |
               |
            Aggregator(record count, group by Field1)
               |
               |
Seq file1-----Join-----------------Seq File2(write the join output)
             based on Field1

2. Execute a Unix command as post job sub routine to add a column which performs the following: 
a. For record where record count is >1 add a new field with value Y
b. For record where record count is =1 add a new field with value N
But I'm to create a dataset output, and not any text file, and Unix commands do not work on dataset files.
As because I'm to introduce a new column, can't Column Generator stage serve this purpose, without using any Transformer/Sort?


Please help.

Regards,
Kumarjit.
Pain is the best teacher, but very few attend his class..
udayanguha
Premium Member
Premium Member
Posts: 37
Joined: Wed Oct 29, 2014 10:48 pm
Location: Ohio

Post by udayanguha »

You can use a sort stage to generate key change column. Then in the transformer through stage variables check for change in key change column and assign value accordingly.
If key change column is '0', assign Y. If previous key change column was 0 and current is '1', assign 'Y' else 'N'
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

If you had checked the last few lines of my post, you might remember that I'm trying to achieve this goal WITHOUT USING TRANFORMER/SORT STAGES.....

Anywayz, thanks for your feed.

Regards.
Pain is the best teacher, but very few attend his class..
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you permitted to use a Modify stage?

If yes use a column generator to generate "Y" for all rows then use the Modify stage to convert the NULL from left outer join into "N". And/or use a fork/join to split the streams based on the result of the join (or lookup).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

Thanks Ray, but I was not able to view you full post as it's Premium Content . :)
However, I will try and change the design to extent I was able to see in your post.

Regards,
Kumarjit.
Pain is the best teacher, but very few attend his class..
AshishDevassy
Participant
Posts: 32
Joined: Fri Nov 07, 2014 11:18 am

Post by AshishDevassy »

Is there a reason that you dont wish to use transformer ?
that's wierd ?
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

AshishDevassy wrote:Is there a reason that you dont wish to use transformer ?
I intend not to load the job, when the same can be achieved by other lightweight stages like column generator and/or modify stages.

Regards.
Pain is the best teacher, but very few attend his class..
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

What I'm trying to do is:

Code: Select all


           Seq file2 
               | 
               | 
            Aggregator(record count, group by Field1) 
               | 
               | 
Seq file1-----Join-----------------Column Generator Stage--------------------------Taget Dataset
				   (column to generate=F2, column method=Explicit)   

In the Mapping tab of the column generator stage, add the following as the derivation for the output field F2
If(input.count=1 then'N' else 'Y')


But, can I create such derivations against an output column of the column generator stage?

Please advise.

Regards.
Pain is the best teacher, but very few attend his class..
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

kumarjit wrote:I intend not to load the job, when the same can be achieved by other lightweight stages like column generator and/or modify stages.
You are relying on out-of-date knowledge. These days (since about version 8.7) the Transformer stage is no less efficient than most other stages, sometimes it's more efficient (for example than the Filter stage).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

ray.wurlod wrote:You are relying on out-of-date knowledge. ...
I'm afraid to admit that its true to some sense. But if there are not more than 1K rows in the input, should I be trying something as time consuming as a transformer?

Please advise.

Regards.
Pain is the best teacher, but very few attend his class..
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

What makes you think transformer is a time consuming stage. The weight of transformer has decreased over time and its not an expensive stage anymore. Now its even lighter than filter and switch stages. If you can combine work of 2 or more stages in transformer, it may give you better result as well.

I think you were not able to see the complete reply from Ray.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

In addition, Join and Aggregation needs sorted as well as partitioned data, so it will insert a sort under the covers as well.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
kumarjit
Participant
Posts: 99
Joined: Fri Oct 12, 2012 7:47 am
Location: Kolkata

Post by kumarjit »

I'm not a premium member, and I'm not able to view Ray's posts.
Anyways, thanks to all of you for your time and suggestions.

Warm Regards.
Pain is the best teacher, but very few attend his class..
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Well by all means, sign up. It's incredibly affordable.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply