Load entire data record into array and process in routine

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Load entire data record into array and process in routine

Post by sbass1 »

Hi,

I have a job where I need to do data validations on about 50 columns, and reject the row if any column fails validation.

My current approach is something like (incoming link is "Extract"):

Stage Variables C1-C50, plus Audit and InvalidRecord

C1: If Not(Extract.Var1 Matches "FOO") Then "Var1;" Else \\
C2: If IsNull(Extract.Var2) Then "Var2;" Else \\
...
C50: If Not(Extract.VarWhatever="T" and Extract.Var50="Y") Then "Var50;" Else \\

(It doesn't really matter what the validations are. The approach is to set the stage variable to the name of the input column if it fails validation, otherwise set it to the empty string).

Audit: C1:C2:C3:...:C50 * just concatenate the stage vars together
InvalidRecord: Len(Audit) > 0

I'm finding this approach is too difficult to maintain.

What I'd like to do is the following:

* Load the entire input row into an array
* Pass that array as a parameter to a routine
* The routine will contain all the code that does the validations
* The routine passes back those columns that fail validation, and the rest is as it is currently

Perhaps pass two arrays to the routine: the name of the input column, and the value of the input column. Or perhaps a two dimensional array, whichever approach is easiest to maintain. Otherwise I'll need to "know" inside the routine which array element maps to which column.

Soooo...

1) How do I load the input row into an array and pass that to a routine?
2) Does anyone have sample code that passes an array as input into a routine and parses it? I can hit the doc, but if you have a code sample that will save me some time.

Thanks,
Scott
Last edited by sbass1 on Mon Apr 13, 2009 10:46 pm, edited 1 time in total.
Some people are the Michael Jordan's of Datastage. I'm more like Muggsy Bogues :-)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1) If you find fifty stage variables and one that concatenates them all into a string, then you'll find one that does the same plus inserts dynamic array delimiters even more difficult to maintain. And then there's maintaining the routine itself. The parsing into fifty fields and directing each to the correct validation code would be horrendous to maintain, surely.

2) I don't have any sample code, because I've always believed in doing it in stage variables. I think that, well documented, this is an eminently maintainable approach.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

I coded all the "mandatory" validations per my initial posting. I've now found I missed a number of "additional constraint" validations from the specification.

I'd like the code to be written in the same order as the source data column specification in the spec, and I'd like the Audit field (written to the Rejects file) to have the validations in the same order as the columns in the source file.

Being so pedantic in my coding also helps ensure I've not forgotten a validation check from the spec.

If there was a way I could cut and paste the code from a block of stage variables to another, that would be acceptable. In other words, "move" the code from C30-C50 to C40-C60, for example.

If I don't go with a routine to do the validation, I can see me Exporting to DSX, editing the DSX, and Importing.

Useful constructs in DS would include:

* Easy declaration of all columns in the input record.
In SAS: array my_array{*} _ALL_; (assuming all columns are the same data type)

* Easy concatenation of trimmed data with a given delimiter.
In SAS: cat_string = catx("~",_ALL_); (hmmm, perhaps I can create this function...how do I deal with an undefined number of parameters???)

* Referencing a data value by abstraction.
IOW, if I have the string "Extract.Var1", I want the value of the data element reference by that string. In pseudocode, something like:

value = eval("Extract.Var1"), where eval is the magic function...

If I could pass a list of columns to a routine, that routine parsed the list of columns, and was able to derive the value of each column, that would be really cool.

If there are ways to easily do the above in DS, please let me know.

Another approach may be to read the data as one long delimited string, passing that string to the validation routine, "manually" parsing the string by position, etc.

This is similar to what Vincent recommended here, albeit for a different problem: viewtopic.php?t=125895&highlight=

If I go this route, I'll likely keep the column name and position in an array loaded in shared memory. This way if a new column ever gets added, I just make the change in one spot.

I'm not convinced that this would be more difficult to maintain, but I'm open to all input.

Thanks,
Scott
Some people are the Michael Jordan's of Datastage. I'm more like Muggsy Bogues :-)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Did you not know that you can rearrange the order of any grid in DataStage?

Select the "key" column - usually the leftmost - and slide it up or down, usually using the right mouse button for the drag/drop but just occasionally the left (ya gotta love the consistency).

There is another way, in server jobs, but it's technically illegal (because it's reverse engineering). Even so, if you search the forum for STAGECOM.ARR you might find something interesting. Then search the DataStage BASIC manual for techniques for converting between dimensioned and dynamic arrays - STAGECOM.ARR is one of the former.

And there's no guarantee at all that this technique will still work in version 8 and beyond.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sbass1
Premium Member
Premium Member
Posts: 211
Joined: Wed Jan 28, 2009 9:00 pm
Location: Sydney, Australia

Post by sbass1 »

ray.wurlod wrote:Did you not know that you can rearrange the order of any grid in DataStage?

Select the "key" column - usually the leftmost - and slide it up or down, usually using the right mouse button for the drag/drop but just occasionally the left (ya gotta love the consistency).
Create a new job - you can leave it untitled/unsaved
Drop a transformer on the canvas
Create StageVars C1-C4
For C1 & C2, type "asdf" (doesn't matter that it's invalid)
Now try dragging and/or cut and pasting "asdf" from C1&C2 to C3&C4, leaving C1 & C2 now blank.
Doing this by row is unacceptable - it has to be a block move.

If there is a way to do this, then I apologize for being dense. Please let me know.

Thanks...
Some people are the Michael Jordan's of Datastage. I'm more like Muggsy Bogues :-)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Block move is not possible without hacking either the DS_JOBOBJECTS record or the DSX file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply