Page 1 of 1

BigIntegrate Questions

Posted: Tue Sep 12, 2017 10:47 pm
by dsuser_cai
Hi

We have BI 11.5 on Linux, Hortonworks Data platform. We are building a huge data lake in hadoop. I have few questions about how BI 11.5 works. I couldnt find answers to these in google. So could you please share some light.

1. Hive Queries - If i want to run any HIVE query on interactive mode using PUTTY, by default the execution engine is on Tez, but we can change this to use map reduce engine. But how does BigIntegrate works? For Example when i use a HIVE stage does it runs on Tez or MR engine? What about if i use File Connector stage to build a HIVE table?
2. Dynamic configuration File - Can someone provide the IBM link for some details on what is Dynamic config file and how it differs from traditional DS config file?
3. is there a way to start a tez session and keep it open, then run all my HIVE queries and then once i complete all HIVE queries, i would like to close the tez session. I would like to do this in a BI JOB. for example... I want to open the tez session using an execute command stage (shell script), then run all my jobs that uses HIVE load and extractions, once complete then finally close tez session. I dont know if BI uses tez in the background, but when i run H-SQL's it runs on tez... so i'm assuming that BI would also use tez in the background. but i may be completely wrong..

So any advice on these would be very helpful.

Posted: Sun Sep 24, 2017 7:16 am
by Timato
1) AFAIK the hive stage is merely a wrapper for the hive jdbc connector - my recollection is that the execution is deferred to the default for your environment (doesnt switching to MR require restarting hive anyways?). If you're using the file connector to create a hive table i dont imagine it would need to invoke a MR/tez job anyways - it should be restricted to the name node/metastore.
2) The DS Knowledge centre pages are ok - but i found this random IBM doco floating on linkedin to be massively useful:
https://app.box.com/s/b0wonh8vv5bn8g8eaaj76cy7deui27cx (specifically page 30 onwards for your question)
(credit: https://www.linkedin.com/pulse/informat ... -malhotra/)
3) Sorry no idea on this one. you may not have that sort of level of granular control within DataStage/BI.