Reading RTF Data

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Reading RTF Data

Post by admin »

Hi all,

I have a requirement to read data present in thousands of RTF docs to a database. The data in formatted in the files, such that one record is structured over 1 page. I wonder if i can read it through datastage.

Needless to say, I cannot open each file and save it as a sequential file. I tried converting it to other formats like XML or HTML but the tag info is of little help.

Any suggestions as to how best to handle such a data source. Any inputs from the community shall be of great help.

Thaks in advance,

Regards,

Vivek Pandey ==^===========================================================
Consultant - Business Intelligence
SONATA SOFTWARE LTD.
193, R.V. Road,
Bangalore - 560004 INDIA
Ph: 91-80-6567492/6/7, 6561063/4/7, 6562068, 6568495
Fax: 91-80-6567487
Web: http://www.sonata-software.com ==^===========================================================
A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines. With consistency a great soul has
simply nothing to do. - Ralph Waldo Emerson



*********************************************************************
Disclaimer: The information in this e-mail and any attachments is confidential / privileged. It is intended solely for the addressee or addressees. If you are not the addressee indicated in this message, you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to Internet email for messages of this kind.
*********************************************************************
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Hi all,

I had posted a query about reading data in .rtf files and with some assistance from Lamont have been able to accomplish the same. Here how i did it. Hopefully it may be if interest to someone.

I created the DataStage jobs to load the .rtf data in the database...and it was done all in DataStage 4.1 using nothing more
than out of the box, drag-and-drop functions from the DataStage Development Environment. The DataStage job parsed the .rtf file
and filtered the unwanted tags and info. This was more difficult than it looked because there are no real tags defining the data and there were different numbers of lines or positions in which the data could be located. DataStage easily handled the variable nature of the data. That gave us a file that listed records like:
x1
x2
x3
x4
y1
y2
y3
y4
y5

which we tranformed to
x1 x2 x3 x4
y1 y2 y3 y4 y5

That could then be loaded with some more logic to take care of differential no of columns in each record. The thousands of files can either be streamed through the Folder stage or called through Job Control.

Creation of this job took only a few hours time.

Regards,

Vivek Pandey

> -----Original Message-----
> From: Vivek Pandey. [SMTP:vive@sonata-software.com]
> Sent: Wednesday, November 07, 2001 1:32 PM
> To: datastage-users@oliver.com
> Subject: Reading RTF Data
>
> Hi all,
>
> I have a requirement to read data present in thousands of RTF docs to
> a database. The data in formatted in the files, such that one record
> is structured over 1 page. I wonder if i can read it through
> datastage.
>
> Needless to say, I cannot open each file and save it as a sequential
> file. I tried converting it to other formats like XML or HTML but the
> tag info is of
> little help.
>
> Any suggestions as to how best to handle such a data source. Any
> inputs from the community shall be of great help.
>
> Thaks in advance,
>
> Regards,
>
> Vivek Pandey
> ==^===========================================================
> Consultant - Business Intelligence
> SONATA SOFTWARE LTD.
> 193, R.V. Road,
> Bangalore - 560004 INDIA
> Ph: 91-80-6567492/6/7, 6561063/4/7, 6562068, 6568495
> Fax: 91-80-6567487
> Web: http://www.sonata-software.com
> ==^===========================================================
> A foolish consistency is the hobgoblin of little minds, adored by
> little statesmen and philosophers and divines. With consistency a great soul has
> simply nothing to do. - Ralph Waldo Emerson
>
>
>
> *********************************************************************
> Disclaimer: The information in this e-mail and any attachments is
> confidential / privileged. It is intended solely for the addressee or
> addressees. If you are not the addressee indicated in this message,
> you may not copy or deliver this message to anyone. In such case, you
> should destroy
> this message and kindly notify the sender by reply email. Please advise
> immediately if you or your employer does not consent to Internet email for
> messages of this kind.
> *********************************************************************
*********************************************************************
Disclaimer: The information in this e-mail and any attachments is confidential / privileged. It is intended solely for the addressee or addressees. If you are not the addressee indicated in this message, you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to Internet email for messages of this kind.
*********************************************************************
Locked