Regarding Ontology File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
additiya
Participant
Posts: 3
Joined: Thu May 29, 2008 4:13 am

Regarding Ontology File

Post by additiya »

Please explain how to load .obo extention file

I have an obo extension file. File structure is given below:
[Term]
Id : 123 : abc
name: abcd
xref : 1:a
xref : 2:b
xref : 3.c
alt_id: a : c
alt_id: a : c
is_a : 456

but i need the data in below structure:
id name xref alt_id is_a
123:abc 1:a,2:b,3:c a:c,d:e 456

want to convert it from obo to txt format. Please help me. this is very urgent.
Your help will be very much appreciated.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Welcome.

Not a big fan of people just posting input and output samples and then sitting back and waiting for a solution. Show us what you've tried, how you think it needs to be tackled and what issues you've had trying to implement those thoughts. However...

My first suggestion would be to forget about it being any kind of an "ontology" file and just look at the structure. Put into words the process of turning the input into output, that process of verbalizing the requirement can go a long way towards a solution. For example, what I see is a vertical pivot, a conversion of rows to columns. However, not a very simple or straight-forward one, unfortunately.

Now, in the Server world this can be easily solved with a hashed file. A transformer that both looks up to and writes to the same (non-cached) hash file can easily aggregate records like this. Use a stage variable to store the name, as that will always be the key to the lookup. Add it as a discrete column to the data stream and then do the lookup in a second transformer. When the first column says "name" you should get a miss on the lookup and then write out an "empty" record to the hashed file with just the name field (your hashed file key) filled in. As each subsequent lookup for that same name comes along and succeeds after that, look at the first column again to see what column to update. If the looked up column is null, simply push the value there. Not null? Append with a space in front of it. When complete, source from the hashed file and write to a flat file. Sort things on the way if that matters.

I don't normally spell things out like this non-Premium style and we don't do any kind of "urgent" here but you got lucky today. First one's on the house. :wink:

Next time, if it is really all that urgent, contact your official support provider. That's why your company pays them those big bucks... make them earn it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
bhasds
Participant
Posts: 79
Joined: Thu May 27, 2010 1:49 am

Post by bhasds »

Hi additiya,

You can read the entire data as a single file and in the transformer stage variable -

1.

Code: Select all

If Field(Colname,":",1)<> SV2 Then Field(Colname,":",2, Dcount(Colname,":")) Else SV1 :",":Field(Colname,":",2, Dcount(Colname,":"))                        SV1

Field(DSLink4.loc,":",1)    SV2
2.In derivation take two columns

Code: Select all

Field(DSLink4.loc,":",1)     col1

SV1    col2
3.Map it to a hashed file and keep the col1 as key column.


4.From hashed file map only the col2 to a sequential file stage.
additiya
Participant
Posts: 3
Joined: Thu May 29, 2008 4:13 am

Post by additiya »

Thank you chulett for your reply.

As per your suggestion i developed one server job. but it is not giving me the desired result.

Job design is given below:

Source file -> transformer1 -> transformer2 -> hash file
!
Lookup hash file

In my file ID is unique so I used id as key, then i did lookup. Extract all columns in transformer1 ,then I pass it to transformer2 for lookup. Here i use ID as lookup key. and declare one stage variable Id whose value is id column which is coming from transformer 1.

Derivation for xref column is
(If DSLink47.Xref <> '' Then DSLink47.Xref : ',' : DSLink49.Xref Else DSLink49.Xref )

DSLink47 : source column link
DSlink49: Lookup file column link

Please let me know your comment. Thank you.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First thought - in your derivation check for null not an empty string as they are not the same thing. Other than that, you'd have to explain to us what it is doing, what kind of output you are seeing so we can provide cogent help. Right now all you've basically said is "it doesn't work".
-craig

"You can never have too many knives" -- Logan Nine Fingers
additiya
Participant
Posts: 3
Joined: Thu May 29, 2008 4:13 am

Post by additiya »

Hi chulett,

Values are repeating. Like value of Xref column is
0,0, 1:a, 2:b, 3:c, 3:c,3:c,3:c

But in the file we have only 3 values for xref column. 1:a,2:b and 3:c for a particular ID.

For few records, for few ID we dont have any Xref column but in output I am getting the value of xref column.

please let me know if you need any more input from my end. Thank you.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hard to say being on this side of the glass but I'd guess you're not being selective enough updating the columns. Only update the column for the record you have received. When it says "xref" only update the xref column. When it says "alt_id" only update the alt_id column. All other colums should be set to their current values so they do not change unless it is their turn.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply