Teradata Multiload and EE - error

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Rajvnb
Participant
Posts: 2
Joined: Tue Jul 31, 2012 3:47 am

Teradata Multiload and EE - error

Post by Rajvnb »

Hi All. Am new to DataStage 8 and Teradata. I have an issue running a Teradata query thru Datastage ODBC or TD Connector. With and without subquery is running weird.

The issue is, I have 2 tables, A and B. A_table has 61 million records, and B_table has 2 million records.
B_table is actually used as a subquery with the A_table. Like the below,

SELECT 50 columns FROM A_TABLE SAMPLE 10
WHERE ship_dt IN
( SELECT (CASE WHEN B_TABLES.SDAJ > 0 THEN
CAST ((TRIM(CAST((CAST((B_TABLES.SDAJ / 1000) AS INTEGER) +1900) AS INTEGER) )
ELSE (DATE'1600-01-01')
END) Sh_Dt FROM B_TABLES.SDAJ where B_TABLES.SDAJ = 0 GROUP BY 1)

Issue:
When I run the above query in TD SQL assistant, it runs for 2 mins and gives 10 records as output.The same query when ran in DataStage using ODBC or Teradata EE,
its running for more than 45 mins. Below are my findings,

1. When I run only SELECT 50 COLUMNS from A_TABLE SAMPLE 10. It runs with in 2 mins.
2. When I add the subquery and run the script, its getting delayed to 50 mins. (The output of the sub query is just a single record only.) the whole query runs with in 2 mins in Teradata SQL assistant

Query:
1. I would like to know, why adding the subquery is delaying the extract in Datastage, when the same query is running within 2 minutes with and without
the above subquery.
2. Am not sure,if the Datastage source stages will first extract all the records from BASE table and then will do the join with the subquery table.
3. The performance statistics is not showing the record flow also. Which means the Query is still running or somewhere is blocked.

Kindly clarify.

Regards
Raj
Regards,
Raj
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

DataStage is just a client, it sends the SQL to the database for parsing & execution just like TD SQL Assistance. It doesn't change at all the way SQL works in any database. That being said, I don't have an answer for the behaviour you are seeing. For that, I would suggest involving your TD DBA, have them trace the session and see what is going on. Veryify that it is running the SQL that you think it is and that the explain plan is correct. Your DBA should also be able to determine if there are any 'blocks' or locks or issues of that nature. Please post what you find.

From this side of the glass, all we could do is guess.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Rajvnb
Participant
Posts: 2
Joined: Tue Jul 31, 2012 3:47 am

Re: Teradata Multiload and EE - error

Post by Rajvnb »

Thanks Chulett, I get that. It should be just passing the script to TD. But, not sure what makes it delay. Since the source data is huge, am going to try with the Teradata EE stage, as it support FASTEXPORT utility and I read in the forums that its efficient when used as READ.

Any suggestions on how to plan the RequestedSessions and SessionsPerPlayer? I read that, if the RS is set to 8, then the player will be considered as 8, and TD session will run on the Players accordingly. Can I know how this Players, Sessions and AMPs are related?
Regards,
Raj
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Teradata Multiload and EE - error

Post by kwwilliams »

What settings did you have the Teradata connector set? I routinely pull many more rows than the two million you have referenced in less than a minute.

Access method: bulk
Isolation level: read uncommitted
Transaction record count: 0
Array size: Does not apply when using bulk, but you can still get a warning if the row size * the array size is greater than the teradata maximum buffer size

Max Sessions: Set to 0 and if you don't get good performance talk to your dba
Min Sessions: Set to 0 and if you don't get good performance talk to your dba
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

Personally, I think you really should use the TD Connector. It is the go-forward method in DS, and also supports Teradata's Parallel Transport - the go-forward toolset in TD.

We use TD Connector exclusively here and it works fine with any volume. As was noted in an earlier posting, we deal with huge volumes day-in, day-out.

TD EE stage uses the old FastExport and FastLoad utilities, so no matter what volume you have you will always use a utility slot (there is a limited number of these in TD). TD Connector allows you to specify bulk vs. immediate - utility vs. non-utiltiy.

Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Re: Teradata Multiload and EE - error

Post by bcarlson »

kwwilliams wrote:Max Sessions: Set to 0 and if you don't get good performance talk to your dba
Min Sessions: Set to 0 and if you don't get good performance talk to your dba
Depending on your version of TD, if you use 0, then TD may automatically open 1 session per available AMP. If you have 100 AMPS on your system, you will use 100 sessions. We have 1099 AMPS on ours, and our DBAs shoot to kill if we specify 0.

You should always specify a maximum number of sessions even if you don't specify a minimum. This applies to both DS jobs and TD scripts (TPT or FastExport or FastLoad, etc.).

Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Teradata Multiload and EE - error

Post by kwwilliams »

Thanks Brad, the max was actually a copy and paste typo, I keep that set to 14, but still stick with the comment that the original poster should talk to his TD admin.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

I heartily agree - your DBA will be a great resource. They may have reocmmendations about how many sessions to use. In our environment, we vary the # of sessions based on the volume of data (rec count times rec size). That also helps us detemine if a load utility is required or if it is small enough to use a standard 'immediate' mode load (which has no load slot usage).

Brad.
It is not that I am addicted to coffee, it's just that I need it to survive.
Post Reply