Page 1 of 1

regarding usage of a field that is not part of Aggregator.

Posted: Tue Feb 07, 2012 3:11 pm
by srinath51
Hi All,
i have a Requirement.

Please Advise me on the Job flow.

i have 6 fields from a Dataset output and from these 6 fields,5 fields are part of Aggregator i.e; one among the 5 fields is Amount which is calculated based on the Sequence of other 4 fields.

I can use these 5 fields in the Aggregator and write the output to a copy stage and use them in transformer.

But my question will be how can i get the Field 6 will also be used in the transformer stage(as this field is part of output)..

this is my Job flow(right now) :

dataset ====>aggregator====>copy =====>transformer=====> seqfile.

please help me on how i can use the field 6 into the transformer (though i didnt use the field in aggregator)

Re: regarding usage of a field that is not part of Aggregato

Posted: Tue Feb 07, 2012 4:09 pm
by kwwilliams
You would have to use another design but it would likely create a cartesian product as a result. Assume your dataset has 100 rows in it, you have 5 columns run through the aggregator and is reduced to 20 aggregated rows. How would you fit the 6th column (which still has 100 rows) fit together with the aggregated rows?

Code: Select all

 (the top two copy stages are just place holders for neatness)

                  copy -----------------------copy
                    |                           |
dataset --------  copy ----- aggregator ------lookup----transformer



Posted: Tue Feb 07, 2012 4:16 pm
by Kryt0n
A standard split join... send 4 key + 1 agg field down to aggregator, send the 4 key + 1 other down a second stream, join them on the key on the other side

Edit: and do a dedup on the second stream... (memory not serving well)

Posted: Tue Feb 07, 2012 9:26 pm
by kandyshandy
Let's get more info about the 6th field.. Is your 6th field going to be constant for the "4 keys combination" based on which you aggregate the amount?

Posted: Wed Feb 08, 2012 8:02 am
by srinath51
Hi Kandy,
the 6th field is not a constant field for "4 keys" by which i aggregate the AMOUNT.

Posted: Wed Feb 08, 2012 8:18 am
by HendrikB
Maybe you could provide some sample data for the 6 input columns and the aggregated (and partly calculated) result data set ...?
Maybe it would make sense NOT to use the aggregator stage for your requirements.

Posted: Wed Feb 08, 2012 8:42 am
by srinath51
Hi all,
Here is the sample fields and Data :

Buss unit | Cust code| product key |Seq no |Billing Date | Amount|
5001 AAT 1230 1 jan06 0.12
5001 AAT 1230 1 jan06 0.87
5001 AAP 2437 1 jan06 0.89
6161 AAP 2437 2 feb02 -128.27
6161 AAP 2437 2 feb02 12.32
7652 AAT 2437 3 feb02 8.76
7652 AAT 2437 3 feb02 7.23


Here is the Req : i am generating a text file,the Amount should be the aggregated field and it is based on the seq Key in the following order Buss unit,Cust code,Product key,Seq no.

i am able to achieve this using the Aggregator Stage,by using Aggregation column of calculation = "Amount" and the Grouping keys as Buss unit,Cust Code,Product Key,Seq no.

but when i am trying to generate the output from the transformer Stage,i need to use the leftover field also i.e.; "Billing Date".

so my question will be How can i get this field "Billing Date" into the Transformer Stage so that i can write this field in the output.

Posted: Wed Feb 08, 2012 9:11 am
by HendrikB
Taking a further look at your sample data (on the first group only) ...

Buss unit | Cust code| product key |Seq no |Billing Date | Amount|
5001 AAT 1230 1 jan06 0.12
5001 AAT 1230 1 jan06 0.87

If Billing Date is not part the key construction, this could also be, right?

Buss unit | Cust code| product key |Seq no |Billing Date | Amount|
5001 AAT 1230 1 jan06 0.12
5001 AAT 1230 1 jan07 0.87

After aggregation then you want to write 2 rows to target (?)

Buss unit | Cust code| product key |Seq no |Billing Date | Amount|
5001 AAT 1230 1 jan06 0.99
5001 AAT 1230 1 jan07 0.99

Please confiirm ...

Posted: Wed Feb 08, 2012 10:02 am
by srinath51
Hi Hendrikb,

for group1 this is the output i am expecting :
5001 AAT 1230 1 jan06 0.99
5001 AAP 2437 1 jan06 0.89

Posted: Wed Feb 08, 2012 10:20 am
by HendrikB
Ok, if your calculation should be done on a daily base, just add the billing date to your keys in the aggregator stage ...

Posted: Wed Feb 08, 2012 2:24 pm
by srinath51
Ok,will include it in the Aggregator stage.

Posted: Wed Feb 08, 2012 3:10 pm
by Kryt0n
You should be asking the business what their requirements are, should it be aggregated by day, month, year, eternity... if they want it at a higher level than day but they want the day, then ask them which day - first, last, any... dsxchange cannot provide you with your requirements

Posted: Wed Feb 08, 2012 3:43 pm
by kumar_s
As mentioned earlier, Billing Date should be part of your Aggregation Key group.
But technically, if you want to leave out one filed on the aggregation and want to join it later, you ll have to do it literally the same.
You ll have to use the Join (or other Stages to join) the filed from the source based on the Keys soon after the Aggregator.

Posted: Wed Feb 08, 2012 8:31 pm
by kandyshandy
srinath51 wrote:Hi Kandy,
the 6th field is not a constant field for "4 keys" by which i aggregate the AMOUNT.
But your sample data has it as a constant. Requirements should be clear to come up with the right approach.

Posted: Wed Feb 08, 2012 10:50 pm
by ray.wurlod
Use Sample stage with Skip = 4 but otherwise a 100% sample.