Page 1 of 1

BDFS Row Column Number

Posted: Fri Apr 24, 2015 12:53 pm
by eli.nawas_AUS
I would like to understand whether the "Row Column Number" option in the BDFS file input step is impacted by partitioning or other parallelism options, or if it will always produce row numbers matching the source file.

Posted: Fri Apr 24, 2015 5:01 pm
by ray.wurlod
Why not perform some experiments and let us know?

Posted: Sat Apr 25, 2015 6:56 am
by chulett
... probably not many people actively using the stage. Worst case contact support. In either case, let us know what you find!

Posted: Sat Apr 25, 2015 3:22 pm
by ray.wurlod
Keep in mind, too, that "the file" is not necessary a valid concept in Big Data. Data in what is logically "a file" will more than likely be distributed across nodes in a Hadoop distributed file system or similar. So what can "row number" mean in this context?