Label Value Processing

DSwarrior · Post by **DSwarrior** » Wed Feb 10, 2016 3:37 pm

Has anyone used DataStage to process Label Value transactions?
I am reading a DB2 extraction file from a Tax system.
The tax system has various forms and schedules that have an identifier.
Each form and schedule has numbered boxes.
A form/schedule identifier and the numbered box forms a label, the corresponding value is whatever was entered into the box.
This combination creates 10,000 different labels, the labels are fixed in size, the values are not, they can be any size, the file is delimited.
The file contains transactions with a client identifier and data in a header record, followed by multiple client records that contain the many different label and value entries.
Only updates/changes are extracted into file.
The extraction file is Variable Blocked.
Any experience on how to process this type of file using DataStage?

ray.wurlod · Post by **ray.wurlod** » Wed Feb 10, 2016 4:09 pm

So, ignoring the header for the moment, all you really need to do is to process label/value pairs. This is very similar to processing name/value pairs (for example in Hadoop) so should be very straightforward.

Use stage variables to capture the values from the header rows to build your record structures.

Use Change Capture stage to detect whether the record is new, needs to be updated, or needs to be flagged as no longer in use (or use the Slowly Changing Dimension stage to achieve a similar result).