Page 1 of 1

Standard Deviation in DataStage?

Posted: Tue Oct 10, 2006 7:18 am
by ArndW
I would very much like to calculate standard deviation inside a DataStage job and/or a DataStage function. Has anyone here either written such a function or have any recommendations on how to achieve this?

I know I can do this in Excel but I would like to keep this contained inside a DataStage job.

Posted: Tue Oct 10, 2006 8:12 am
by kumar_s
HI Arnd,
Not sure if you ready to use intermediat stage/ files/Jobs, I gess it could be done through Datastage, if you are ready to pass the total count and average of the value that you need to find the SD as parameter.
If you able to find the following for each record

Code: Select all

(x - pAvgx)*(x - pAvgx)

and sum it using Aggregator, and if you find

Code: Select all

SQRT(sum value/pTotalCount)
I guess you can get the value.
Where pAvgx and pTotalCount are parameters.

Posted: Tue Oct 10, 2006 8:21 am
by ArndW
I looked at the formula on Wikipedia and it wasn't too bad, so I wrote:

Code: Select all

*************************************************************************
** accept a delimited string of values to and a standard deviation. **
** is started.                                                         **
**                                                                     **
** Ver.  Date       Author   Comments                                  **
** ===== ========== ======== ========================================= **
** 1.0.0 2006/10/10 AWussing Initial coding and testing                **
**                                                                     **
*************************************************************************
   Ans       = '' ;** assume an invalid return
   WorkList  = CONVERT(' ,;:':@VM:@TM:@IM,@FM:@FM:@FM:@FM:@FM:@FM,DynamicList)
   ListCount = DCOUNT(WorkList,@FM)
   IF ListCount > 1
   THEN
      Median    = SUM(WorkList)/ListCount
      Summation = 0
      FOR x = 1 TO ListCount   
         WorkNumber = WorkList<x>
         Summation += (WorkNumber-Median)**2
      NEXT x
      Ans = (Summation / ListCount) ** .5
   END ;** of if-then we have a valid list

Posted: Tue Oct 10, 2006 1:47 pm
by ray.wurlod
The Aggregator stage performs standard deviation as one of the aggregate functions.

Make sure there are no nulls in the column being processed.