transformer lookup

dsscholar · Post by **dsscholar** » Tue Nov 01, 2011 1:48 pm

please explain lookup

Please clarify. Thanks in advance.

jwiles · Post by **jwiles** » Tue Nov 01, 2011 2:01 pm

Based on your description and the example record you provided, it appears that you will be reading a basic fixed-width sequential text file (the ".dat" extension is largely irrelevant--the actual data itself is more important).

1) You can read it as you describe, but that is generally more work
2) You can insert a column import stage before the transformer to parse out the columns
3) You can define your columns in the sequential file stage as you normally would for any sequential text file.

Have you been provided with any layout for the file (column definitions)? What about the record delimiter?

Regards,

ray.wurlod · Post by **ray.wurlod** » Tue Nov 01, 2011 5:31 pm

You can import the table definition for the sequential file, marking it as fixed width and providing the column widths (on the Format tab) and column names (on the Define tab).

dsscholar · Post by **dsscholar** » Tue Nov 01, 2011 9:49 pm

Thanks guys.

Record delimiter is newline. I will try both the ways and let u know.

dsscholar · Post by **dsscholar** » Thu Nov 03, 2011 9:58 am

Hi ray,

The column width will be varying. As u suggested, it should be fixed right.. How to proceed in case of varying column widths

jwiles,

How to use column import in case of varying column widths..

ThankS in advance

dsscholar · Post by **dsscholar** » Thu Nov 03, 2011 10:00 am

Hi all,

No header in the sequential file. New line is the record delimiter.

And in case of using transformer, i dont find a function corresponding to substr. Can u please help me in that.

Thanks in advance

BI-RMA · Post by **BI-RMA** » Thu Nov 03, 2011 10:07 am

You need to specify a delimiter or (if this is longer than 1 character) a delimiter string, to tell DataStage where to split up the records into columns.

Of course, you will have to remove the property "Record length=fixed"

The sequential file stage is documented in Parallel Job Developer Guide.

dsscholar · Post by **dsscholar** » Thu Nov 03, 2011 10:43 am

BI-RMA,

i mentioned varying column widths and not varying record length.

Eg : Total length : 50

ten columns not with equal length 5 forms the 50. one will be with 3,other with 7 like that.

And source wont be having any field delimiter then how to specify the delimiter in case of column import stage for this scenario.

As ray suggested, i imported the file with fixed length option,then in the format tab column width option is there and i dont know what value to give as each column will be having different different lengths.

This is my issue now. clarify or please suggest some other solution.

Thanks in advance,

dsscholar · Post by **dsscholar** » Thu Nov 03, 2011 10:44 am

BI-RMA,

i mentioned varying column widths and not varying record length.

Eg : Total length : 50

ten columns not with equal length 5 forms the 50. one will be with 3,other with 7 like that.

And source wont be having any field delimiter then how to specify the delimiter in case of column import stage for this scenario.

As ray suggested, i imported the file with fixed length option,then in the format tab column width option is there and i dont know what value to give as each column will be having different different lengths.

This is my issue now. clarify or please suggest some other solution.

Thanks in advance,

jwiles · Post by **jwiles** » Thu Nov 03, 2011 10:53 am

Transformer substring function: string_name[start,length]

such as: Name[1,20]

I don't recall exactly where in the docs that function is shown...probably under the Server transformer...but it is valid within parallel transformer as well.

For the Column Import stage, setting it up is almost exactly the same as you would in the Sequential File stage. My question to you is: If you are working with a fixed-width file, would you truly have variable-width columns in that fixed-width file?

If you are referring to the width of the actual data within the string column (i.e. maybe

Code: Select all

"MYCITY            "

where "MYCITY" is the actual data, padded with spaces into a fixed-width column), use a transformer to trim and convert to a VarChar field after importing as a fixed-width field.

Regards,

jwiles · Post by **jwiles** » Thu Nov 03, 2011 11:04 am

i dont know what value to give as each column will be having different different lengths

Are you saying that the width of these non-delimited columns may change from record to record, and they happen to add up to 50? Is there anything within the record which can indicated what the widths actually are within a particular record?

If the columns can change width with each record, and you have no width-identifying data, you will likely need some custom logic, whether in a transformer, custom operator, BuildOp or an external source or external filter.

Regards,

dsscholar · Post by **dsscholar** » Thu Nov 03, 2011 1:04 pm

jwiles wrote:
i dont know what value to give as each column will be having different different lengths
Are you saying that the width of these non-delimited columns may change from record to record, and they happen to add up to 50? Is there anything within the record which can indicated what the widths actually are within a particular record?

If the columns can change width with each record, and you have no width-identifying data, you will likely need some custom logic, whether in a transformer, custom operator, BuildOp or an external source or external filter.

Regards,

No jwiles. What u said previously is correct. Column width is not varying. In my sample, first 4 characters -> corresponds to JE_YEAR, second 5,6 characters -> JE_MONTH and 7th character -> JE_WEEK_ID. Column length is fixed. For all the records the column width will be same. I just know the data in the source file and i want it to be separated as columns in the target dataset.

I asked column width thing in different sense regarding ray's suggestion. I tried to do that.

1) I imported the file with fixed width option marked.
2) i was able to give all the 24 column names with length according to my requirement.
3) But i dont understand one point. he asked to give column width in the format tab. What can i give in that. Because the i want the first column to have length as 4 and second column will have as 2 and third as 1 and so on...

According to your suggestion :

What i did is initially, i extracted the data with one column

(Column name : Data:Varchar:387(lengh of each record) )

In sequential file stage i dint specify any format, for delimiter i gave "none". I was able to read the record properly. each with 387 characters. The data will be having spaces also. Once the column separation is done. And for the columns which have only spaces, i have to replace with null.

And in column import, i first tried by specifying 3 columns JE_YEAR,JE_MONTH and JE_WEEK_ID with lenth as 4,2,1. But the loaded dataset was not correct.

Only the first column contained the result i.e JE_YEAR -> 2010 and for remaining 2 columns it was blank. It will work only if specify all the columns?

Or simply if use the sequential file and give all the column names with lengh as per requirement it will work ah? Header plays a role right. Without that, this case is not possible right?

Please suggest.

jwiles · Post by **jwiles** » Thu Nov 03, 2011 1:17 pm

Headers and trailers are not an absolute necessity. Your primary requirement is to correctly define the metadata so that the entire record is processed.

The following is true for either Sequential File or Column Import:
If you define only a few of your actual columns (such as only the first three columns--10 bytes--of a 50 byte record), you must include an additional column which will capture the remainder of the record. For example:

Column1 Char(3)
Column2 Char(4)
Column3 Char(3)
RestOfRecord Char(40)

If you're familiar with the COBOL language, think of the process as having to define the classic "FILLER" columns in a COBOL File Descriptor: Whether or not you're going to use the entire record, you must account for all of it in your schema/metadata. If you don't want "RestOfRecord", DROP it with a modify stage after the import/seqfile stage.

Why are you only specifying a few columns in Column Import when you specify all of them in Sequential File? The requirements for both stages are the same, which is why you have almost identical user interfaces for them (Format Tab with the same options and the columns tab). Underneath the covers they are pretty much the same operator.

dsscholar · Post by **dsscholar** » Thu Nov 03, 2011 9:19 pm

Thanks for your timely help jwiles. Will try it out today.

ray.wurlod · Post by **ray.wurlod** » Fri Nov 04, 2011 1:24 am

When you are importing the table definition you give the column widths on the Format tab. These are kept (and displayed) in the Display Width column of a Columns grid in a saved table definition. All you need to do on the Define tab is provide column (field) names.