How do you ingest your data into datalake, ...?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

How do you ingest your data into datalake, ...?

Post by olgc »

Hi there,

How do you ingest your data into datalake: Datastage, Dataset, Informatica or sqoop, ...?

I were trying this for a while with Datastage, creating cases with both IBM and Cloudera. With JDBC driver, we can ingest small amount of records (fewer than 1000) successfully, but ran hours for just 10 thousand rows. A good news was it's very good of extracting from datalake, just like from any other data store.

Let us know how you do.

Thanks,
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

To be clear, datalake is meant created with Hadoop technology: Hive, Impala, hBase, and/or Kudu.

Thanks,
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Our place does not have any of those technologies, but have you tried the File connector stage?

https://www.ibm.com/support/knowledgece ... arent.html
Choose a job you love, and you will never have to work a day in your life. - Confucius
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Post by olgc »

[quote="qt_ky"]Our place does not have any of those technologies, but have you tried the File connector stage?

Yes, we did, but unfortunately didn't get it worked due to permission issue (we set the highest security level possible for our Hadoop platform, so permission always a tough task).
We got it worked well outside of Datastage by sftp or Linux command scp. So one solution is creating the target as a file, and transfer the file to Hadoop platform, then use Impala / Hive load data statement load it into table.

Late we developed a better solution, get this easy and very productive. Please refer https://www.linkedin.com/pulse/datalake ... ven-huang/ for a glimpse of the solution.

Thanks,
Post Reply