Multiple readers per node loading into ODBC

VCInDSX · Post by **VCInDSX** » Fri May 30, 2008 1:11 am

Hi Group,
My job design is as follows. This is in a 4 node GRID environment

Seq_File ==> Transformer ==> ODBC
 ||
 ||
 \/
Rejected records

The job was using a default one node config file and some of our larger files are loading for a longggg time.

I am trying this with different values for the number of nodes and partitions per node to see if it helps in faster loading.
One other option that I would like to try is the "Multiple readers per node".

In any of the above attempts, how will the underlying job behave when it comes to the target ODBC stage. Will it spawn multiple instances of ODBC connections to load the data or will all the readers be queued up at the target load phase?

The ODBC stage's execution property is currently set to "Default (Parallel)".

One thing for sure is that the target Database table (SQL Server 2005) is NOT partitioned.

Any suggestions in this regard will be much appreciated.

Thanks

ray.wurlod · Post by **ray.wurlod** » Fri May 30, 2008 6:34 am

The ODBC Enterprise stage will establish one connection to the data source per processing node defined in the configuration file.

VCInDSX · Post by **VCInDSX** » Fri May 30, 2008 7:12 am

Hi Ray,
Thanks for the reply. That helps in clearing up a few items.

A few additional question regarding the nodes especially, in the GRID world. I am trying to understand how some of the PX GRID parameters boil down into code during execution and would greatly appreciate your feedback if i am not correctly interpretting the log messages.

The configuration file that is generated dynamically by the resource manager emits the node sections according to the values that one specifies for the GRID environment variables.

If I specify
$APT_GRID_COMPUTENODES = 2 and $APT_GRID_PARTITIONS = 4, then the dynamic config file contains sections as given below
In effect, it is what you have explained in your earlier response. Are these the entries one should look at to get an idea of how many nodes and instances are being launched in the back-end?

Code: Select all

<Dynamic_gird.sh> SEQFILE Host(s): ctpcqabdsc01p: ctpcqabdsc02p:
{
         node "Conductor"
 {
  fastname "ctpcqabdsh01p"
  pools "conductor"
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node1_1"
 {
  fastname "ctpcqabdsc01p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node1_2"
 {
  fastname "ctpcqabdsc01p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node1_3"
 {
  fastname "ctpcqabdsc01p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node1_4"
 {
  fastname "ctpcqabdsc01p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node2_1"
 {
  fastname "ctpcqabdsc02p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node2_2"
 {
  fastname "ctpcqabdsc02p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node2_3"
 {
  fastname "ctpcqabdsc02p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
 node "node2_4"
 {
  fastname "ctpcqabdsc02p"
  pools ""
  resource disk "/nfsdata/data1/datasets" {pools ""}
  resource scratchdisk "/scratch" {pools ""}
 }
}

The debug information about datasets shows

Code: Select all

main_program: This step has 3 datasets:
ds0: {op0[1p] (sequential SF_RCPInputFile)
      eAny<>eCollectAny
      op1[8p] (parallel APT_TransformOperatorImplV0S1_GEMAsCollectedLoader_V2_Job_XFM_RCP_2_Database in XFM_RCP_2_Database)}
ds1: {op0[1p] (sequential SF_RCPInputFile)
      ->eCollectAny
      op3[1p] (sequential APT_RealFileExportOperator in SF_Source_Rejects)}
ds2: {op1[8p] (parallel APT_TransformOperatorImplV0S1_GEMAsCollectedLoader_V2_Job_XFM_RCP_2_Database in XFM_RCP_2_Database)
      eAny=>eCollectAny
      op2[8p] (parallel ODB_AsCollected)}
It has 4 operators:
op0[1p] {(sequential SF_RCPInputFile)
    on nodes (
      Conductor[op0,p0]
    )}
op1[8p] {(parallel APT_TransformOperatorImplV0S1_GEMAsCollectedLoader_V2_Job_XFM_RCP_2_Database in XFM_RCP_2_Database)
    on nodes (
      node1_1[op1,p0]
      node1_2[op1,p1]
      node1_3[op1,p2]
      node1_4[op1,p3]
      node2_1[op1,p4]
      node2_2[op1,p5]
      node2_3[op1,p6]
      node2_4[op1,p7]
    )}
op2[8p] {(parallel ODB_AsCollected)
    on nodes (
      node1_1[op2,p0]
      node1_2[op2,p1]
      node1_3[op2,p2]
      node1_4[op2,p3]
      node2_1[op2,p4]
      node2_2[op2,p5]
      node2_3[op2,p6]
      node2_4[op2,p7]
    )}
op3[1p] {(sequential APT_RealFileExportOperator in SF_Source_Rejects)
    on nodes (
      node1_1[op3,p0]
    )}
It runs 18 processes on 9 nodes.

Further down the execution code in the director shows

Code: Select all

main_program: APT_PM_StartProgram: Locally - /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine -APT_PMprotoSectionLeaderFlag --APTNoSetupProgram /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/bin/osh.exe -APT_PMsectionLeaderFlag ctpcqabdsh01p 10002 0 30 Conductor ctpcqabdsh01p 1212125620.477694.1de9 0 -os_charset UTF-8  
APT_PM_StartProgram: Remotely - /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/remsh -n ctpcqabdsc01p /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine -APT_PMprotoSectionLeaderFlag --APTNoSetupProgram /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/bin/osh.exe -APT_PMsectionLeaderFlag ctpcqabdsh01p 10002 1 30 node1_1 ctpcqabdsc01p 1212125620.477694.1de9 0 -os_charset UTF-8  
APT_PM_StartProgram: Remotely - /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/remsh -n ctpcqabdsc01p /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine -APT_PMprotoSectionLeaderFlag --APTNoSetupProgram /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/etc/standalone.sh -APT_PMsetupFailedFlag /nfsgrid/nfsbin/IBM/InformationServer/Server/PXEngine/bin/osh.exe -APT_PMsectionLeaderFlag ctpcqabdsh01p 10002 2 30 node1_2 ctpcqabdsc01p 1212125620.477694.1de9 0 -os_charset UTF-8  
...
...
...
truncated for brevity...

When these 2 nodes are working on this job, there is no way any other job can get into this node, correct?

Also, I have not utilized the "Multiple readers per node" in my design. Is it safe to assume that the input data is automatically partitioned by the engine into these 8 connections on the database?

Many thanks for your invaluable time,

lstsaur · Post by **lstsaur** » Fri May 30, 2008 2:46 pm

Correct. While the job is running (using 2 nodes), the Resource Manger will prevent additional jobs from using the "same" resources until the running job completes.

VCInDSX · Post by **VCInDSX** » Sun Jun 01, 2008 11:27 pm

Thanks lstsaur. That was helpful.

In our case the server is shared across several teams and if one of them happen to submit a job that takes 2 hours of processing time, the rest of the folks are left to use the remaining nodes or wait until this node becomes free. Looks like this needs careful planning when it comes to scheduling jobs that might "hog" the resources...

Any thoughts on the "Multple readers" and splitting the data into number of nodes and DB connections will be appreciated.

Thanks again,

bcarlson · Post by **bcarlson** » Mon Jun 02, 2008 1:46 pm

Multiple readers always helps on your file reads unless you are dealing with tiny files. There are 2 options - "Read from Multiple Nodes" which will have 1 reader per node. Or you can manaully specify how many readers to use with "Number of readers per node". If 1, then it will operate no differently than the "Read from Multiple Nodes" option. If more than 1. The 2 options are mutually exclusive.

Brad

bcarlson · Post by **bcarlson** » Mon Jun 02, 2008 1:52 pm

Using multiple connections can help speed up your processing as long as you can do it evenly. We pull data from a mainframe DB2 database down to our local UDB database on Unix and do something similar. We can't do a partitioned read from the source like we can on UDB, so we do it manually. For example, if there are 50 million records with numeric id's, you could pull 0-10 million in one query, 10M+1 - 20M in another query, 10M+1 - 30M in another, etc. The key is to make sure you are manually partitioning your reads or writes evenly, just like the purpose of a partitioner in DS is to make sure you have even distribution.

Reading is easier to 'partition' than writing. When writing you have to make sure you are not blocking/deadlocking yourself by haveing 3+ jobs writing to the same table in parallel. You could split your write into 3 parallel jobs or streams writing to 3 separate work tables, then use SQL to populate your target table. Course that increases your data footprint, too.

Brad.

lstsaur · Post by **lstsaur** » Mon Jun 02, 2008 7:02 pm

Again, in a grid environment, the Resource Manger will enforce limits on resource usage by jobs. That job you mentioned will be terminated automatically due to exceeding the default Resource_Max's value. In order to run it, it has to set the Advance Reservations attribute to the desired time. You still can run your job as long as you specify a different queue name. You don't have to wait until the nodes are free.

Finding the perfect mix of file servers, compute nodes and database resources is a time consuming process. However, with the Resource Manger tool for grid, I can monitor CPU usage on each type of node to detemine if there's a bottle neck. Then I just continue adjusting the parameters until the job runs most efficiently. So, talke to that "person" who sets up the Grid Computing for you.

VCInDSX · Post by **VCInDSX** » Thu Jun 26, 2008 1:47 am

Apologies for the delay.. Thanks for taking the time to explain the details.

Brad,
At this time we are reading from text files and this happens to be an RCP job of the following design.
Seq File ==> Transformer ==> ODBC
||
||
\/
Rejects file

The sequenctial file has a "Keep File Partitions" property. Is there any other means/option that I should look at? This property is currently set to the default (False). Would enabling this help in any way?

Right now, I have set the GRID params to 1 node and 4 partitions per node. With this setting i can see (From debug logs) that the manager has split the data into the 4 partitions by itself

As for the multiple nodes and multiple readers per node, it is mutually exclusive to the "First Line is Column Names" property that is enabled in our jobs.... I need to evaluate that further or decide to send the header down the rejects path.

lstsaur,
I didn't realize that the job would be terminated. So far we have had jobs waiting the queue for a couple of hours when the nodes were occupied by other jobs.
As for

You still can run your job as long as you specify a different queue name

would this require multiple queues to be configured by the resource manager?
We are just entering the phase where we have started seeing these queue backlogs. Next steps would be to perform what you have suggested..
Monitor the resources and tune the parameters and node allocations.

Thanks again

bcarlson · Post by **bcarlson** » Thu Jun 26, 2008 9:01 am

My understanding is the 'Keep File Partitions' is related to reading filesets where there are multiple files read. In other words, if my fileset has 5 files (i.e. 5 partitions) then the output of the sequential stage will be those same 5 partitions.

If you need to ignore the column headers, then you may be out of luck. On the other hand, you could easily use Unix script to chop off the first record. Then the file can use the multiple readers.

I know you can use the filter option to eliminate the first record, but like the column header option, I believe it eliminates the option for multiple readers.

Hope this helps.

Brad.

VCInDSX · Post by **VCInDSX** » Wed Jul 30, 2008 8:28 am

Hi Brad,
Apologies for the delay as I was "away" for some time...
Thanks for the followup. I think the best bet is to have a simple UNIX script to handle the header row before sending it to the seq file stage for further processing. We have some more time before we can make a decision on those changes in our project....

lstsaur & team,

....You don't have to wait until the nodes are free.

Is it possible to configure the resource manager so that the "idling" CPUs/Partitions in a computing node can be used by other jobs? From the earlier responses, i got the impression that a computing node is "locked" by a job and not available for any other job until the submitted job completes and returns it back to the resource manager. If that earlier job is utilizing only one CPU out of the available 4 (4 CPU computing node) in that computing node, the remaining CPUs in that node are not available for any other job.

Thanks again for your invaluable time