Challenging design problem

amouhi1 · Post by **amouhi1** » Tue Feb 09, 2016 3:43 pm

Hi Datastage gurus,

Here's the challenge:

I have a table like:
----------------------------
Indicators|Business rule|
----------------------------
Ind1 |R1+R2 |
----------------------------
Ind2 |R1-R2+R3 |
----------------------------

and an other table with the values of the components of the Business rule:

------------------------
component |Value |
------------------------
R1 |100 |
------------------------
R2 |50 |
------------------------
R3 |200 |
------------------------

My need is to evaluate each indicator, to have something like:
----------------------------
Indicators|Value|
----------------------------
Ind1 |150 |
----------------------------
Ind2 |250 |
----------------------------

Please let me know if you have any idea that would help.

Thanks,
Ahmed

ray.wurlod · Post by **ray.wurlod** » Tue Feb 09, 2016 4:22 pm

At least three solutions exist. What have YOU tried?

amouhi1 · Post by **amouhi1** » Wed Feb 10, 2016 10:09 am

I tried to pivot the 2nd table to have something like:
------------
R1 |R2 |R3
------------
100 |50 |200
------------

and then with a join b/w the 2 tables (using a technical key)
-----------------------------------------------
Indicators |Business rule | R1 |R2 |R3
------------------------------------------------
Ind1 |R1+R2 | 100 |50 |200
------------------------------------------------
Ind2 |R1-R2+R3 |100 |50 |200
------------------------------------------------

and with a transformer i tried to replace each + sign with "+DSLink." (same for - sign and adding "DSLink." at the begining) to have something like:
---------------------------------------
Indicators |Business rule
---------------------------------------
Ind1 |DSlink.R1+DSlink.R2
---------------------------------------
Ind2 |DSlink.R1-DSlink.R2+DSlink.R3
---------------------------------------

but with datastage 8.1 I couldn't find a function to replace a substring with another.

chulett · Post by **chulett** » Wed Feb 10, 2016 10:35 am

One doesn't exist until a much later release, I'm afraid. Depending on your C++ coding chops you can look here for the PX version of the Server EReplace function, which an exact search for "PXEReplace" should turn up I do believe.

rkashyap · Post by **rkashyap** » Wed Feb 10, 2016 9:46 pm

What is the source of the ComponentValue and BusinessRule file?
How big is the ComponentValue file?
Which flavor of Unix are you using?

ray.wurlod · Post by **ray.wurlod** » Wed Feb 10, 2016 11:47 pm

rkashyap wrote:What is the source of the ComponentValue and BusinessRule file?
How big is the ComponentValue file?
Which flavor of Unix are you using?

Why do any of those matter?

ray.wurlod · Post by **ray.wurlod** » Thu Feb 11, 2016 12:06 am

I think the easiest solution is to use a loop in a Transformer stage to extract each of the delimited components from Business_Rule. This output can then stream into a Lookup stage to effect the conversions.

priyadarshikunal · Post by **priyadarshikunal** » Thu Feb 11, 2016 12:51 am

Loop variable was introduced in 8.5, so no looping in 8.1 I guess. I would simply vertical pivot, join and then aggregate. Or split in different columns and do multiple lookups if number of components are low and finite.

chulett · Post by **chulett** » Thu Feb 11, 2016 8:15 am

ray.wurlod wrote:
rkashyap wrote:What is the source of the ComponentValue and BusinessRule file?
How big is the ComponentValue file?
Which flavor of Unix are you using?
Why do any of those matter?

Probably because there's some awk coming.

rkashyap · Post by **rkashyap** » Thu Feb 11, 2016 11:13 am

Craig, You are right.

The options that I am thinking of, involve using operating system and database capabilities in conjunction with DataStage.

1. Perform all calculations in a database script/procedure. Invoke database module from DataStage to extract output.
OR

2. Utilizing either database or Unix (sed/awk), split signs and components into separate columns. Subsequently pivot and aggregate the values. i.e.
2.1. Extract data from BusinessRuleTable, using delimiter ',+' append key(IND) with value of BusinessRule column. If leading '-' sign exists in BusinessRule, then substitute accordingly.

Code: Select all

Output would be
      Ind1,+R1+R2 
      Ind2,+R1-R2+R3

2.2. Add delimiters to split BusinessRule values to separate columns.

Code: Select all

 sed 's/+/,+,/g;s/-/,-,/g'

Note: 2.2 can also be performed in database SELECT statement.

2.3. Pivot the data, by either Pivot enterprise stage or Unix

Code: Select all

awk -F"," '{for(i=2;i<=NF;i+2){print $1","$i","$(i+1)}}'
Output would be
      Ind1,+,R1
      Ind1,+,R2 
      Ind2,+,R1
      Ind2,-,R2
      Ind2,+,R3

2.4. Using Lookup stage, replace Component(R1, R2 ...) with their values.

2.5. Aggregate the data using transformer caching (or after combining sign with value in Aggregator stage).
OR

3. Similar to #2 above; but using csv files.
3.1. Write contents of the two tables to csv files BusinessRuleFile and ValueFile.

3.2. Substitute "Indicators" with values in BusinessRule File and then PIVOT. This can be done in External Source stage by passing following command

Code: Select all

sed -e 's/^/s\//' -e 's/\,/\//' -e 's/$/\,\/g;/' <ValueFile>|tr '\n' ' '|xargs -i sed {} <BusinessRuleFile>|awk -F"," '{for(i=2;i<NF;i++){print $1","$i}}
Output would be
      Ind1,100
      Ind1,+50 
      Ind2,100
      Ind2,-50
      Ind2,+200

Aggregate values on key(IND).

Possible limitations: #3 will not work if value file is large as there is a finite limit on length of argument passed to xargs and also memory available to awk arrays. These limits vary by flavor of Unix.

Consider upgrading to a newer/supported version DataStage.