Standard Deviation in DataStage?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Standard Deviation in DataStage?

Post by ArndW »

I would very much like to calculate standard deviation inside a DataStage job and/or a DataStage function. Has anyone here either written such a function or have any recommendations on how to achieve this?

I know I can do this in Excel but I would like to keep this contained inside a DataStage job.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

HI Arnd,
Not sure if you ready to use intermediat stage/ files/Jobs, I gess it could be done through Datastage, if you are ready to pass the total count and average of the value that you need to find the SD as parameter.
If you able to find the following for each record

Code: Select all

(x - pAvgx)*(x - pAvgx)

and sum it using Aggregator, and if you find

Code: Select all

SQRT(sum value/pTotalCount)
I guess you can get the value.
Where pAvgx and pTotalCount are parameters.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I looked at the formula on Wikipedia and it wasn't too bad, so I wrote:

Code: Select all

*************************************************************************
** accept a delimited string of values to and a standard deviation. **
** is started.                                                         **
**                                                                     **
** Ver.  Date       Author   Comments                                  **
** ===== ========== ======== ========================================= **
** 1.0.0 2006/10/10 AWussing Initial coding and testing                **
**                                                                     **
*************************************************************************
   Ans       = '' ;** assume an invalid return
   WorkList  = CONVERT(' ,;:':@VM:@TM:@IM,@FM:@FM:@FM:@FM:@FM:@FM,DynamicList)
   ListCount = DCOUNT(WorkList,@FM)
   IF ListCount > 1
   THEN
      Median    = SUM(WorkList)/ListCount
      Summation = 0
      FOR x = 1 TO ListCount   
         WorkNumber = WorkList<x>
         Summation += (WorkNumber-Median)**2
      NEXT x
      Ans = (Summation / ListCount) ** .5
   END ;** of if-then we have a valid list
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Aggregator stage performs standard deviation as one of the aggregate functions.

Make sure there are no nulls in the column being processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply