Ray et al,
Ok, full disclosure...I have this working now. Thanks for all your help.
At the risk of seeming
uber-geeky...
I believe this approach to be overly complex and will continue to refactor it until there is nothing left to throw away. Kind of like the processes used in poetry, sculpting or painting...so I'm told by my more creative acquaintences. Keep at it until only the purest form remains.
General Overview
1. There are a pair (1 function and 1 transform) used to edit each column.
2. There are another pair (1 function and 1 transform) for each reject row.
3. There are three stage variables to track state of column and row validations.
4. Each column/field to be validated in the output is passed through the same function.
5. Errors are accumulated in a dynamic, hashed array (ErrLog) defined in named COMMON
6. The dynamic array (ErrLog) is held in named COMMON and accessed through a transformation function in the Reject link to compile a consolidated list of column errors to be associated with the row rejected.
Stage Variables:
Code: Select all
Name Initial Value Description
InRejectMsg "" Input record reject message
RowErrCount 0 Number of errors encountered while editing row
RejectInputRow 0 Boolean True/False for rejected record
RejectInputRow has the following stage variable derivation defined:
RowErrCount > 0
Both
InRejectMsg and
RowErrCount are stage variables managed by the functions auditing each column/row. This management is accomplished via DSGetVar() and DSSetVar().
Example of usage in Output sink:
ValidateTblxxx(Input.CODE_TYPE, "code_type", @INROWNUM, "ExpectedValueHere", 1)
ValidateTblxxx(Input.FACILITY_CODE, "facility_code", @INROWNUM, "RR", 2)
ValidateTblxxx(Input.SALES_CODE, "SALES_CODE", @INROWNUM, "",3)
...
Output sink has the following Constraint defined:
RejectInputRow = @FALSE
Example of usage in Reject link:
Reject sink Column Name:
MessageDerivation:
"ColErr: " : TblxxxRowCK(@INROWNUM) : " |DSLastLinkErr: " : DSLinkLastErr : "|DSStageLastErr: " : DSStageLastErr
Function/Transform used to validate each column:
Code: Select all
* Function Name: TblxxxColumnValidationEdit
* Interface: TblxxxColumnValidationEdit(ColIndex, ColName, InputRowNo, ExpectedValues, ActualValue)
* Referenced later by a custom Transformation: ValidateTblxxx()
* ValidateTblxxx(%column_ordinal%, %col_name%, %input_row_num%, %expected_value%, %actual_value%)
*
* This routine will validate the xxxtable columns
* If an invalid value is found, the input reject message stage variable
* will contain a descriptive reason
*
Common /TblxxxLog/ CommonInitialized, ErrLog, ErrCount, MaxErrCount, RowErrCount, PrevInputRowNumber
$INCLUDE DSINCLUDE JOBCONTROL.H
Equate RoutineName To 'TblXXXColumnValidationEdit'
* ------------------------------------------------------------------
* Define external function links here
* ------------------------------------------------------------------
DEFFUN GetStageVar(varname) Calling "DSGetVar"
DEFFUN SetStageVar(varname, varvalue) Calling "DSSetVar"
DEFFUN UtilityMessageToLog(msg) Calling "DSX.UtilityMessageToLog"
* ------------------------------------------------------------------
* constants used to indicate column action state
* ------------------------------------------------------------------
Equate PASSED To 0
Equate FAILSTAGE To -1
Equate FAILROW To 1
Equate FAILCOLUMN To 2
Equate DEFAULTED To 3
* ------------------------------------------------------------------
* Define useful stage variables here
* ------------------------------------------------------------------
Equate SV_ROWREJECTMSG To "InRejectMsg"
Equate SV_ROWERRORCOUNT To "RowErrCount"
* ------------------------------------------------------------------
* Obtain local copies of stage variable values
* ------------------------------------------------------------------
svRejectMsg = GetStageVar(SV_ROWREJECTMSG)
svRowErrCount = GetStageVar(SV_ROWERRORCOUNT)
* ------------------------------------------------------------------
* build location information
* ------------------------------------------------------------------
sMsgLocation = DSHostName : "." : DSProjectName : "." : DSJobName : "." : DSStageName : "." : DSLinkName : "." RoutineName
*<== remove to debug ==>* status = UtilityMessageToLog(sMsg)
* ------------------------------------------------------------------
* determine if the named COMMON data area requires initialization
* ------------------------------------------------------------------
IF CommonInitialized = 0 THEN; * is this the first time here?
CommonInitialized = 1; * set flag indicating common already initialized
ErrorCount = 0; * reset the error count
MaxErrCount = 1000; * set limit on number of errors to record
ErrLog=""; * initialize the dynamic, hashed array in COMMON
PrevInputRowNumber = -1
END
* ------------------------------------------------------------------
* Make local copies of args
* ------------------------------------------------------------------
iColIndex = ColIndex
sColName = UPCASE(Trim(ColName))
iInputRowNo = InputRowNo
sExpectedValues = ExpectedValues
sActualValue = Trim(ActualValue)
bPriorError = Len(svRejectMsg) > 0; * errors encountered for this row on prior column?
If iInputRowNo <> PrevInputRowNumber THEN
PrevInputRowNumber = iInputRowNo; * initialize current row number
s = SetStageVar(SV_ROWREJECTMSG, ""); * clear the row error message
s = SetStageVar(SV_ROWERRORCOUNT, 0); * reset row-specific error count
RowErrCount = 0; * reset COMMON column error count
END
* ------------------------------------------------------------------
* determine the number of potential values
* if there is only one expected value, assume the expected value
* length to be the same as that of the expected value
* ------------------------------------------------------------------
iExpectedValues = DCount(sExpectedValues,@VM)
IF iExpectedValues = 1 THEN
iExpectedLength = Len(sExpectedValues)
End
* -----------------------------------------------
* if an expected value list was provided,
* Calculate min/max expected value lengths
* -----------------------------------------------
IF iExpectedValues > 0 THEN
iMaxExpectedValueLength = -1; * set impossible maximum expected value length
iMinExpectedValueLength = 99999; * set improbable minimum expected value length
FOR i=1 TO iExpectedValues
iExpectedValueLength = Len(Field(sExpectedValues,@VM,i,1))
IF iExpectedValueLength > iMaxExpectedValueLength THEN
iMaxExpectedValueLength = iExpectedValueLength
END
IF iExpectedValueLength < iMinExpectedValueLength THEN
iMinExpectedValueLength = iExpectedValueLength
END
NEXT
* determine if the column value is in the list of expected values;
* default to found (@true); @false if not
bExpectedValueFound = @TRUE
FIND sActualValue in sExpectedValues Setting F,F,F ELSE bExpectedValueFound = @FALSE END
ELSE
* if no expected values were specified, default to a broad range
iMaxExpectedValueLength = 99999;
iMinExpectedValueLength = 0;
bExpectedValueFound = @TRUE
END; * IF iExpectedValues > 0 THEN
sColRejectReason = ""; * default to no reject reason
iColRejectSeverity = PASSED; * default to column pass
* ----------------------------------------------------------
* column-specific testing begins here...
* ----------------------------------------------------------
BEGIN CASE
CASE sColName = "CODE_TYPE"
* check for errors first, let default handle ok condition
BEGIN CASE
* value is missing/null?
Case IsNull(sActualValue) OR Len(sActualValue) = 0
ssColRejectReason := "missing/null value;"
RejectSeverity = FAILROW
* value too short?
Case Len(sActualValue) < iMinExpectedValueLength
sColRejectReason := "value too short (" : Len(sActualValue) : ")" : " [" : sActualValue : "];"
iColRejectSeverity = FAILROW
* value too long?
Case Len(sActualValue) > iMaxExpectedValueLength
sColRejectReason := "value too long (" : Len(sActualValue) : ")" : " [" : sActualValue : "];"
iColRejectSeverity = FAILROW
* value not expected?
Case bExpectedValueFound = @FALSE
sColRejectReason := "expected (" : sActualValue : ") <>" : " [" : sExpectedValues : "];"
iColRejectSeverity = FAILROW
* must be ok then
Case @True
Ans = sActualValue; * set return value to plug in outbound column
iColRejectSeverity = PASSED;
END Case
CASE sColName = "FACILITY_CODE"
* check for errors first, let default handle ok condition
BEGIN CASE
* value is missing/null?
Case IsNull(sActualValue) OR Len(sActualValue) = 0
sColRejectReason := "missing/null value;"
iColRejectSeverity = FAILROW
* value too short?
Case Len(sActualValue) < iMinExpectedValueLength
sColRejectReason := "value too short (" : Len(sActualValue) : ")" : " [" : sActualValue : "];"
iColRejectSeverity = FAILROW
* value too long?
Case Len(sActualValue) > iMaxExpectedValueLength
sColRejectReason := "value too long (" : Len(sActualValue) : ")" : " [" : sActualValue : "];"
iColRejectSeverity = FAILROW
* value not expected?
Case bExpectedValueFound = @FALSE
sColRejectReason := "expected (" : sActualValue : ") <>" : " [" : sExpectedValues : "];"
iColRejectSeverity = FAILROW
* must be ok then
Case @True
Ans = sActualValue; * set return value to plug in outbound column
iColRejectSeverity = PASSED
END Case
CASE sColName = "SALES_CODE"
* check for errors first, let default handle ok condition
BEGIN CASE
* value null?
Case IsNull(sActualValue) = @TRUE OR Len(sActualValue) = 0
sColRejectReason := "missing/null value;"
iColRejectSeverity = FAILROW
* value too short?
Case Len(sActualValue) < 1
sColRejectReason := "value too short (" : Len(sActualValue) : ")" : " [" : sActualValue : "];"
iColRejectSeverity = FAILROW
* value too long?
Case Len(sActualValue) > 9
sColRejectReason := "value too long (" : Len(sActualValue) : ")" : " [" : sActualValue : "];"
iColRejectSeverity = FAILROW
* value not expected?
Case bExpectedValueFound = @FALSE
sColRejectReason := "expected (" : sActualValue : ") <>" : " [" : sExpectedValues : "];"
iColRejectSeverity = FAILROW
* no? must be ok then
Case @True
Ans = sActualValue; * set return value to plug in outbound column
iColRejectSeverity = PASSED
END Case
END CASE
* --------------------------------------------------------------------------
* if the column fails the row:
* increment the number of errors encountered
* add it to the error log dynamic array, unless max errors already reached
* --------------------------------------------------------------------------
IF iColRejectSeverity = FAILROW THEN
ErrCount += 1; * increment total errors (all rows)
IF ErrCount < MaxErrCount THEN
RowErrCount += 1; * increment # errors for this row
s = SetStageVar(SV_ROWERRORCOUNT, RowErrCount); * re-save the row error count
sMsg = sMsgLocation : "." : sColName : ".(" : iColRejectSeverity : ")." : sColRejectReason : ".RowErrCount=(" : RowErrCount : ").AllErrCount=(" : ErrCount : ")"
s = UtilityMessageToLog(sMsg)
END
* --------------------------------------------------------------------------
* save column error metadata in hashed dynamic array in COMMON for later
* compilation by the reject output link
* --------------------------------------------------------------------------
ErrLog<ErrCount,1> = sColName
ErrLog<ErrCount,2> = iColIndex
ErrLog<ErrCount,3> = iInputRowNo
ErrLog<ErrCount,4> = sActualValue
ErrLog<ErrCount,5> = iColRejectSeverity
ErrLog<ErrCount,6> = sColRejectReason
END
Return (Ans)
Next, is the routine to fire in the Reject Link...
Code: Select all
* Interface: FUNCTION TblxxxRowCK(InputRowNo)
Common /TblxxxLog/ CommonInitialized, ErrLog, ErrCount, MaxErrCount, RowErrCount, PrevInputRowNumber
* Transformation Interface: TblxxxReject(%InputRowNumber%)
$INCLUDE DSINCLUDE JOBCONTROL.H
Equate RoutineName To 'TblxxxRowCK'
* ------------------------------------------------------------------
* Define external function links here
* ------------------------------------------------------------------
DEFFUN GetStageVar(varname) Calling "DSGetVar"
DEFFUN SetStageVar(varname, varvalue) Calling "DSSetVar"
DEFFUN UtilityMessageToLog(msg) Calling "DSX.UtilityMessageToLog"
* ------------------------------------------------------------------
* Define useful stage variables here
* ------------------------------------------------------------------
Equate SV_ROWREJECTMSG To "InRejectMsg"
Equate SV_ROWERRORCOUNT To "RowErrCount"
* ------------------------------------------------------------------
* Obtain local copies of stage variable values
* ------------------------------------------------------------------
svRejectMsg = GetStageVar(SV_ROWREJECTMSG)
svRowErrCount = GetStageVar(SV_ROWERRORCOUNT)
* ------------------------------------------------------------------
* constants used to indicate column action state
* ------------------------------------------------------------------
Equate PASSED To 0
Equate FAILSTAGE To -1
Equate FAILROW To 1
Equate FAILCOLUMN To 2
Equate DEFAULTED To 3
* ------------------------------------------------------------------
* Format of dynamic array in COMMON area...
* ------------------------------------------------------------------
* Original storage statement to dynamic array in COMMON area
* ErrLog<1, 1> = Column/Field Name
* ErrLog<1, 2> = Column Ordinal Value (1 relative)
* ErrLog<1, 3> = Input Row Number
* ErrLog<1, 4> = Actual Input Value
* ErrLog<1, 5> = Reject Severity Code
* ErrLog<1, 6> = Reject Reason Description
* Values in each column entry in the ErrLog<> dynamic array
Equate ICOLNAME To 1
Equate ICOLINDEX To 2
Equate IROWNUM To 3
Equate ICOLVALUE To 4
Equate ISEVCODE To 5
Equate IREASON To 6
* ------------------------------------------------------------------
* build location information
* ------------------------------------------------------------------
sMsgLocation = DSHostName : "." : DSProjectName : "." : DSJobName : "." : DSStageName : "." : DSLinkName : "." RoutineName
* status = UtilityMessageToLog(sMsg)
* ------------------------------------------------------------------
* determine if the named COMMON data area is initialized, bail if not
* ------------------------------------------------------------------
IF CommonInitialized = 0 THEN ANS=""
else
* ------------------------------------------------------------------
* Make local copies of args
* ------------------------------------------------------------------
iInputRowNo = InputRowNo
iErrorCount = ErrCount
ErrorCode = 0 ;* default to pass; set non-zero to stop the stage/job
sRejectMsg = ""; * initialize the reject msg
IF svRowErrCount > 0 THEN
FOR i = 1 TO iErrorCount
IF ErrLog<i,IROWNUM> = iInputRowNo THEN
iRejectSeverity = ErrLog<i,ISEVCODE>
sRejectReason = ErrLog<i,IREASON>
sRejectMsg := "Sev:" : iRejectSeverity : " |Col: " : ErrLog<i,ICOLNAME> : " |Msg: " : sRejectReason
END
NEXT
END
* store the composite error message back to the stage variable
s = SetStageVar(SV_ROWREJECTMSG, sRejectMsg)
Ans = sRejectMsg
END; * IF CommonInitialized = 0 THEN...
Additional metadata captured in the Reject link include Input row number, event datetimestamp, and the message compiled in the immediate routine above, along with DSHostname, DSProjectName, DSStageName, DSLinkName, RoutineName. I will probably add the input source name as a job parameter in the near future.
Best regards,
--Don