Best way to search through a DataSet

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Best way to search through a DataSet

Post by JPalatianos »

Hi,
I have been asked by our develoment team if there is an alternate/better way to serach through a dataset. I origianlly pointed them to the Data Set Management utility and they cam back with "searching through millions of rows would take hours with the limited row display". Besides dumping to a text file or stage table, is there a way to easily query a dataset for debugging purposes.

We are running Versiuon 8.0.1 on Windows and are in the process of upgrading to 8.7 on Windows.

Thanks - - John
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Short answer: no.

Probably the fastest would be a parallel job that reads the Data Set and uses a Transformer stage to effect the search. You can run this with more nodes than exist in the Data Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

What Ray said is the best way. You can also use ORCHADMIN command to move the data into text file and open it with .xls / use grep (MKS tool kit) and find the name in that text file.

Again it depends your data volume. So you need to decide.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

Use the debugger, breakpointing on a condition equating to the search on the data you're interested in, on the dataset output link
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
JPalatianos
Premium Member
Premium Member
Posts: 306
Joined: Wed Jun 21, 2006 11:41 am

Post by JPalatianos »

I appreciate all the suggestions!!
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

I vote hands down for orchadmin with the dump option , and pipe it to a grep condition. The Dataset Management utillity does not scale . If you are on the newer versions , you can use the debugger also.
sendmkpk
Premium Member
Premium Member
Posts: 97
Joined: Mon Apr 02, 2007 2:47 am

Post by sendmkpk »

ray.wurlod wrote:Probably the fastest would be a parallel job that reads the Data Set and uses a Transformer stage to effect the search. You can run this with more nodes than exist in the Data Set.
so, ray, did you mean, we could write the dataset using one config file and read it with another, how is it possible?

reg
praveen
Praveen
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, that's what I'm saying. A copy of the configuration file used to write the Data Set is stored in its descriptor file and this can be used to read the Data Set (the data then have to be automatically re-partitioned in to the nodes of the currently active configuration file). DataStage looks after that for you. If you prefer to use the orchadmin command specify the -x option.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply