Quick question on the suitability of hash partitioning (or a suitable alternative/work around) for the following scenario -
Assuming I am reading two different data sets which have the same key (lets say employee id).
Data Set A
employee# - char 5
example values "A100 " , " A101"
Data Set B
Employee# - char 8
example values " A100" , " A101 "
If I partition both data sets using Hash partitioning and sort them, will the join later downstream be successful?
By the time I do the join, I have converted both to the same data type.. in this case varchar(4), but at the time of partitioning, since I am trying to do it up front, they were both different (char 5 and char 8 ) .
Would values such as "A100 " and " A100" across both data sets end up in the same partition or not?
Also would I need to wait to do the sort till after I have combined the data to the same data type as well , or can I also sort them up front?
Thanks!
Hash Partitioning on fields with different data lengths
Moderators: chulett, rschirm, roy
-
ray.wurlod
- Participant
- Posts: 54595
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There's no guarantee that "A100" and " A100" will be on the same partition, since they're different values. So you will need to make them the same, possibly via string_trim() in a Modify stage or Trim() in a Transformer stage.
Values that are identical will end up on the same partition if the same partitioning algorithm (based on the same key(s)) is used.
Values that are identical will end up on the same partition if the same partitioning algorithm (based on the same key(s)) is used.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.