Page 1 of 1

Reading xls using Datastage

Posted: Wed Jul 12, 2017 5:57 am
by SURA
Hi

We have to start a new project to read the xls file and load the data into a sql server table .

The challenges I have is

1. ETL need to read the data from 3,4,5...N sheets in a file .
2. in each sheet need to pull some specific range cell values, not the whole file !
3. Other known issue we had was, if the data in the first 15 rows is integer and then date: in such case irrespective of the data type the odbc driver scan and consider it as an int and convert the date as int.

One person is recommending to use SSIS!

I can understand the 11X can do better job than 8.7. But with 8.7 what is the best way to handle it!!

Please throw some light !

Posted: Wed Jul 12, 2017 7:20 am
by UCDI
I know 11x can read excel directly, but yours may not have that. The stage that can do it is the unstructured data stage.

Two things you should know:
1) excel is capable of talking to a database; you can load data from it directly.
2) if your version does not have the stage, you can export the excel as CSV which is plain text, comma delimited and use that in datastage very easily.

You may need to change the column data formats in the excel before you export it, but the tasks you list all sound possible.

Posted: Wed Jul 12, 2017 7:41 am
by chulett
Oh, they are all possible but you'll have all of the "normal" PITA problems we all have with using Excel as a source... as noted, one awesome issue is how it loves to turn things into numbers. And the people building the spreadsheets need to be very disciplined and consistent, something I have yet to see. :( The problems are worse when you don't have something like the Unstructured Data stage to help. From what I recall, the earliest version of that stage showed up in the 9.1 release.

As noted, on a Windows server Excel can talk directly to SQL Server so there may not be a need to involve DataStage until Excel has "staged" it and then you can do the specific ranges / not the whole file bits from there via ETL. Or use the ODBC driver or worst case a csv file per sheet.

Did I mention I hate having to use Excel as a source? :wink:

Posted: Wed Jul 12, 2017 2:05 pm
by SURA
chulett wrote:Oh, they are all possible but
Thank you so much for your reply.

Is that means in 8.7 we can do the range selection ?

Posted: Wed Jul 12, 2017 2:07 pm
by SURA
Thank you UCDI

Posted: Wed Jul 12, 2017 2:43 pm
by chulett
SURA wrote:Is that means in 8.7 we can do the range selection ?
You'd have to answer that... seems to me the answer is "yes" only if that functionality is provided by the ODBC drivers. Failing that you stage the whole file and then from the staging area pull just the data elements / ranges / rows you need.

Posted: Thu Jul 13, 2017 1:10 pm
by ray.wurlod
Using the ODBC driver for Microsoft Excel, each worksheet must be treated as a separate "table" (and there are some restrictions on the format). When importing the table definitions, you must specify to include system tables, because the driver treats each worksheet as a system table for some reason.

Posted: Wed Jul 19, 2017 3:39 pm
by SURA
I written a PowerShell script to store the data as csv and used Datastage to load.

Thanks to all for your help and suggestions.