Hash file job - individual records going to separate files

edmtenzing · Post by **edmtenzing** » Tue May 12, 2015 8:01 pm

Hi all,

I am facing a strange issue that I've never come across before, around hash file creation. I have a process that does two SQL extracts from a database, performs some basic transformation and outputs to a hash file. Here is a screenshot of the job:

The problem is that the other day one of the transforms (bottom one in the image above) outputted individual records to separate files and I cannot make sense of it. So instead of the transformer outputting records to a single hash file, it ended up outputting 150,000ish individual files to the hash directory (the 176K in the above image is from the latest run, but this count read 150Kish on problematic run in question). The impact was that we ended up running out of inodes on our UNIX box because the file creation meant we exceeded the inode threshold. As a result, other DS jobs fell over as there was no space to write to logs, temp directories etc.

On recompile and restart of the job, it restarted and ran as normal, creating a typical hash file with a DATA.30 of 85MB and a OVER.30 of 26MB.

If anyone has encountered a similar issue, I'd really like to hear what you found to be the cause and how you ensured it didn't happen again. I'm hesitant to keep this job running (even though it is seemingly running fine now) in case it topples every job running in production.

Thanks in advance.

ray.wurlod · Post by **ray.wurlod** » Tue May 12, 2015 8:25 pm

Check that you have exactly the same file name on both input links to the Hashed File stage - not only identically spelled, but also identically cased. Incidentally, your link row count is 1.7 million, not 176K.

edmtenzing · Post by **edmtenzing** » Tue May 12, 2015 8:47 pm

Hi Ray, my apologies - it aborted around 150K (I should've left out the bit about the record count in the image).

L_CSTID_TF_O is outputting to hash file named ac_ar_lookup

L_CST_MKT_SEGMENT_TF_O is outputting to a hash file called cst_mkt_segment_id_lookup

It is the former that is having the issue. I can confirm that spelling and casing is all ok.

I should add that this job has been in production since 2008 and this is the first time I, or anyone in my team, has come across this problem.

Cheers

chulett · Post by **chulett** » Tue May 12, 2015 10:34 pm

We've seen it before here and I saw it back in the day. One such example is here and there were two or three others I could find.

edmtenzing · Post by **edmtenzing** » Wed May 13, 2015 8:13 pm

Thank you Craig, this is helpful. Cheers