Page 1 of 1

Challenging design problem

Posted: Tue Feb 09, 2016 3:43 pm
by amouhi1
Hi Datastage gurus,

Here's the challenge:

I have a table like:
----------------------------
Indicators|Business rule|
----------------------------
Ind1 |R1+R2 |
----------------------------
Ind2 |R1-R2+R3 |
----------------------------

and an other table with the values of the components of the Business rule:

------------------------
component |Value |
------------------------
R1 |100 |
------------------------
R2 |50 |
------------------------
R3 |200 |
------------------------

My need is to evaluate each indicator, to have something like:
----------------------------
Indicators|Value|
----------------------------
Ind1 |150 |
----------------------------
Ind2 |250 |
----------------------------

Please let me know if you have any idea that would help.

Thanks,
Ahmed

Posted: Tue Feb 09, 2016 4:22 pm
by ray.wurlod
At least three solutions exist. What have YOU tried?

Posted: Wed Feb 10, 2016 10:09 am
by amouhi1
I tried to pivot the 2nd table to have something like:
------------
R1 |R2 |R3
------------
100 |50 |200
------------

and then with a join b/w the 2 tables (using a technical key)
-----------------------------------------------
Indicators |Business rule | R1 |R2 |R3
------------------------------------------------
Ind1 |R1+R2 | 100 |50 |200
------------------------------------------------
Ind2 |R1-R2+R3 |100 |50 |200
------------------------------------------------

and with a transformer i tried to replace each + sign with "+DSLink." (same for - sign and adding "DSLink." at the begining) to have something like:
---------------------------------------
Indicators |Business rule
---------------------------------------
Ind1 |DSlink.R1+DSlink.R2
---------------------------------------
Ind2 |DSlink.R1-DSlink.R2+DSlink.R3
---------------------------------------

but with datastage 8.1 I couldn't find a function to replace a substring with another.

Posted: Wed Feb 10, 2016 10:35 am
by chulett
One doesn't exist until a much later release, I'm afraid. Depending on your C++ coding chops you can look here for the PX version of the Server EReplace function, which an exact search for "PXEReplace" should turn up I do believe.

Posted: Wed Feb 10, 2016 9:46 pm
by rkashyap
What is the source of the ComponentValue and BusinessRule file?
How big is the ComponentValue file?
Which flavor of Unix are you using?

Posted: Wed Feb 10, 2016 11:47 pm
by ray.wurlod
rkashyap wrote:What is the source of the ComponentValue and BusinessRule file?
How big is the ComponentValue file?
Which flavor of Unix are you using?
Why do any of those matter?

Posted: Thu Feb 11, 2016 12:06 am
by ray.wurlod
I think the easiest solution is to use a loop in a Transformer stage to extract each of the delimited components from Business_Rule. This output can then stream into a Lookup stage to effect the conversions.

Posted: Thu Feb 11, 2016 12:51 am
by priyadarshikunal
Loop variable was introduced in 8.5, so no looping in 8.1 I guess. I would simply vertical pivot, join and then aggregate. Or split in different columns and do multiple lookups if number of components are low and finite.

Posted: Thu Feb 11, 2016 8:15 am
by chulett
ray.wurlod wrote:
rkashyap wrote:What is the source of the ComponentValue and BusinessRule file?
How big is the ComponentValue file?
Which flavor of Unix are you using?
Why do any of those matter?
Probably because there's some awk coming. :wink:

Posted: Thu Feb 11, 2016 11:13 am
by rkashyap
Craig, You are right. :) The options that I am thinking of, involve using operating system and database capabilities in conjunction with DataStage.

1. Perform all calculations in a database script/procedure. Invoke database module from DataStage to extract output.
OR

2. Utilizing either database or Unix (sed/awk), split signs and components into separate columns. Subsequently pivot and aggregate the values. i.e.
2.1. Extract data from BusinessRuleTable, using delimiter ',+' append key(IND) with value of BusinessRule column. If leading '-' sign exists in BusinessRule, then substitute accordingly.

Code: Select all

Output would be
      Ind1,+R1+R2 
      Ind2,+R1-R2+R3
2.2. Add delimiters to split BusinessRule values to separate columns.

Code: Select all

 sed 's/+/,+,/g;s/-/,-,/g'
Note: 2.2 can also be performed in database SELECT statement.

2.3. Pivot the data, by either Pivot enterprise stage or Unix

Code: Select all

awk -F"," '{for(i=2;i<=NF;i+2){print $1","$i","$(i+1)}}'
Output would be
      Ind1,+,R1
      Ind1,+,R2 
      Ind2,+,R1
      Ind2,-,R2
      Ind2,+,R3
2.4. Using Lookup stage, replace Component(R1, R2 ...) with their values.

2.5. Aggregate the data using transformer caching (or after combining sign with value in Aggregator stage).
OR

3. Similar to #2 above; but using csv files.
3.1. Write contents of the two tables to csv files BusinessRuleFile and ValueFile.

3.2. Substitute "Indicators" with values in BusinessRule File and then PIVOT. This can be done in External Source stage by passing following command

Code: Select all

sed -e 's/^/s\//' -e 's/\,/\//' -e 's/$/\,\/g;/' <ValueFile>|tr '\n' ' '|xargs -i sed {} <BusinessRuleFile>|awk -F"," '{for(i=2;i<NF;i++){print $1","$i}}
Output would be
      Ind1,100
      Ind1,+50 
      Ind2,100
      Ind2,-50
      Ind2,+200
Aggregate values on key(IND).

Possible limitations: #3 will not work if value file is large as there is a finite limit on length of argument passed to xargs and also memory available to awk arrays. These limits vary by flavor of Unix.

:idea: Consider upgrading to a newer/supported version DataStage.