Challenging design problem

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
amouhi1
Participant
Posts: 4
Joined: Wed Jan 25, 2012 6:54 am

Challenging design problem

Post by amouhi1 »

Hi Datastage gurus,

Here's the challenge:

I have a table like:
----------------------------
Indicators|Business rule|
----------------------------
Ind1 |R1+R2 |
----------------------------
Ind2 |R1-R2+R3 |
----------------------------

and an other table with the values of the components of the Business rule:

------------------------
component |Value |
------------------------
R1 |100 |
------------------------
R2 |50 |
------------------------
R3 |200 |
------------------------

My need is to evaluate each indicator, to have something like:
----------------------------
Indicators|Value|
----------------------------
Ind1 |150 |
----------------------------
Ind2 |250 |
----------------------------

Please let me know if you have any idea that would help.

Thanks,
Ahmed
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

At least three solutions exist. What have YOU tried?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
amouhi1
Participant
Posts: 4
Joined: Wed Jan 25, 2012 6:54 am

Post by amouhi1 »

I tried to pivot the 2nd table to have something like:
------------
R1 |R2 |R3
------------
100 |50 |200
------------

and then with a join b/w the 2 tables (using a technical key)
-----------------------------------------------
Indicators |Business rule | R1 |R2 |R3
------------------------------------------------
Ind1 |R1+R2 | 100 |50 |200
------------------------------------------------
Ind2 |R1-R2+R3 |100 |50 |200
------------------------------------------------

and with a transformer i tried to replace each + sign with "+DSLink." (same for - sign and adding "DSLink." at the begining) to have something like:
---------------------------------------
Indicators |Business rule
---------------------------------------
Ind1 |DSlink.R1+DSlink.R2
---------------------------------------
Ind2 |DSlink.R1-DSlink.R2+DSlink.R3
---------------------------------------

but with datastage 8.1 I couldn't find a function to replace a substring with another.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

One doesn't exist until a much later release, I'm afraid. Depending on your C++ coding chops you can look here for the PX version of the Server EReplace function, which an exact search for "PXEReplace" should turn up I do believe.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

What is the source of the ComponentValue and BusinessRule file?
How big is the ComponentValue file?
Which flavor of Unix are you using?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

rkashyap wrote:What is the source of the ComponentValue and BusinessRule file?
How big is the ComponentValue file?
Which flavor of Unix are you using?
Why do any of those matter?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I think the easiest solution is to use a loop in a Transformer stage to extract each of the delimited components from Business_Rule. This output can then stream into a Lookup stage to effect the conversions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Loop variable was introduced in 8.5, so no looping in 8.1 I guess. I would simply vertical pivot, join and then aggregate. Or split in different columns and do multiple lookups if number of components are low and finite.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:
rkashyap wrote:What is the source of the ComponentValue and BusinessRule file?
How big is the ComponentValue file?
Which flavor of Unix are you using?
Why do any of those matter?
Probably because there's some awk coming. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

Craig, You are right. :) The options that I am thinking of, involve using operating system and database capabilities in conjunction with DataStage.

1. Perform all calculations in a database script/procedure. Invoke database module from DataStage to extract output.
OR

2. Utilizing either database or Unix (sed/awk), split signs and components into separate columns. Subsequently pivot and aggregate the values. i.e.
2.1. Extract data from BusinessRuleTable, using delimiter ',+' append key(IND) with value of BusinessRule column. If leading '-' sign exists in BusinessRule, then substitute accordingly.

Code: Select all

Output would be
      Ind1,+R1+R2 
      Ind2,+R1-R2+R3
2.2. Add delimiters to split BusinessRule values to separate columns.

Code: Select all

 sed 's/+/,+,/g;s/-/,-,/g'
Note: 2.2 can also be performed in database SELECT statement.

2.3. Pivot the data, by either Pivot enterprise stage or Unix

Code: Select all

awk -F"," '{for(i=2;i<=NF;i+2){print $1","$i","$(i+1)}}'
Output would be
      Ind1,+,R1
      Ind1,+,R2 
      Ind2,+,R1
      Ind2,-,R2
      Ind2,+,R3
2.4. Using Lookup stage, replace Component(R1, R2 ...) with their values.

2.5. Aggregate the data using transformer caching (or after combining sign with value in Aggregator stage).
OR

3. Similar to #2 above; but using csv files.
3.1. Write contents of the two tables to csv files BusinessRuleFile and ValueFile.

3.2. Substitute "Indicators" with values in BusinessRule File and then PIVOT. This can be done in External Source stage by passing following command

Code: Select all

sed -e 's/^/s\//' -e 's/\,/\//' -e 's/$/\,\/g;/' <ValueFile>|tr '\n' ' '|xargs -i sed {} <BusinessRuleFile>|awk -F"," '{for(i=2;i<NF;i++){print $1","$i}}
Output would be
      Ind1,100
      Ind1,+50 
      Ind2,100
      Ind2,-50
      Ind2,+200
Aggregate values on key(IND).

Possible limitations: #3 will not work if value file is large as there is a finite limit on length of argument passed to xargs and also memory available to awk arrays. These limits vary by flavor of Unix.

:idea: Consider upgrading to a newer/supported version DataStage.
Post Reply