Page 1 of 1

How do you ingest your data into datalake, ...?

Posted: Mon May 18, 2020 8:53 am
by olgc
Hi there,

How do you ingest your data into datalake: Datastage, Dataset, Informatica or sqoop, ...?

I were trying this for a while with Datastage, creating cases with both IBM and Cloudera. With JDBC driver, we can ingest small amount of records (fewer than 1000) successfully, but ran hours for just 10 thousand rows. A good news was it's very good of extracting from datalake, just like from any other data store.

Let us know how you do.

Thanks,

Posted: Mon May 18, 2020 11:49 am
by olgc
To be clear, datalake is meant created with Hadoop technology: Hive, Impala, hBase, and/or Kudu.

Thanks,

Posted: Thu May 21, 2020 11:24 am
by qt_ky
Our place does not have any of those technologies, but have you tried the File connector stage?

https://www.ibm.com/support/knowledgece ... arent.html

Posted: Mon May 25, 2020 1:19 pm
by olgc
[quote="qt_ky"]Our place does not have any of those technologies, but have you tried the File connector stage?

Yes, we did, but unfortunately didn't get it worked due to permission issue (we set the highest security level possible for our Hadoop platform, so permission always a tough task).
We got it worked well outside of Datastage by sftp or Linux command scp. So one solution is creating the target as a file, and transfer the file to Hadoop platform, then use Impala / Hive load data statement load it into table.

Late we developed a better solution, get this easy and very productive. Please refer https://www.linkedin.com/pulse/datalake ... ven-huang/ for a glimpse of the solution.

Thanks,