Best way to search through a DataSet

JPalatianos · Post by **JPalatianos** » Fri Jun 07, 2013 7:21 pm

Hi,
I have been asked by our develoment team if there is an alternate/better way to serach through a dataset. I origianlly pointed them to the Data Set Management utility and they cam back with "searching through millions of rows would take hours with the limited row display". Besides dumping to a text file or stage table, is there a way to easily query a dataset for debugging purposes.

We are running Versiuon 8.0.1 on Windows and are in the process of upgrading to 8.7 on Windows.

Thanks - - John

ray.wurlod · Post by **ray.wurlod** » Sat Jun 08, 2013 12:53 am

Short answer: no.

Probably the fastest would be a parallel job that reads the Data Set and uses a Transformer stage to effect the search. You can run this with more nodes than exist in the Data Set.

SURA · Post by **SURA** » Wed Jun 12, 2013 10:42 pm

What Ray said is the best way. You can also use ORCHADMIN command to move the data into text file and open it with .xls / use grep (MKS tool kit) and find the name in that text file.

Again it depends your data volume. So you need to decide.

miwinter · Post by **miwinter** » Thu Jun 13, 2013 5:40 am

Use the debugger, breakpointing on a condition equating to the search on the data you're interested in, on the dataset output link

JPalatianos · Post by **JPalatianos** » Wed Jul 10, 2013 1:41 pm

I appreciate all the suggestions!!

rameshrr3 · Post by **rameshrr3** » Wed Jul 10, 2013 2:40 pm

I vote hands down for orchadmin with the dump option , and pipe it to a grep condition. The Dataset Management utillity does not scale . If you are on the newer versions , you can use the debugger also.

sendmkpk · Post by **sendmkpk** » Thu Jul 11, 2013 1:14 am

ray.wurlod wrote:Probably the fastest would be a parallel job that reads the Data Set and uses a Transformer stage to effect the search. You can run this with more nodes than exist in the Data Set.

so, ray, did you mean, we could write the dataset using one config file and read it with another, how is it possible?

reg
praveen

ray.wurlod · Post by **ray.wurlod** » Thu Jul 11, 2013 3:04 am

Yes, that's what I'm saying. A copy of the configuration file used to write the Data Set is stored in its descriptor file and this can be used to read the Data Set (the data then have to be automatically re-partitioned in to the nodes of the currently active configuration file). DataStage looks after that for you. If you prefer to use the orchadmin command specify the -x option.