You're close!
1/2) Yes, in THIS job, this is the only physical dataset. That does NOT mean it's automatically assigned the id ds0.
3) I think more accurate in this case is that the input dataset (the physical one) is not being repartitioned
4) This would be the DataSet stage--the name here will not always match what you see in the Designer palette.
So to break it down by dataset:
1-4) ds0: describes the first dataset and what it feeds into (the Dataset stage or "Source" - actually the copy operator IIRC). No repartitioning.
5-9) ds1: More or less correct. The output of the Dataset operator is the source for ds1, it is being Hash partitioned and fed into the Sort operator with auto collection.
10-12) ds2 is the output of the Copy and feeds into the inserted buffer operator. Because the Sort and Copy are combined into op1, there is no true dataset linking them. The virtual datasets are between processes, not operators within the same process (combined operator or composite operator).
13-18 ) ds3 links the output of the parallel buffer to the input of the Sequential File stage (actually the export operator). The SortedMerge collector is used--I don't know if you selected that in the Stage or if that's what DS chose. If DS chose it, maybe the engine inferred that was what the job's intention was, to have a sorted sequential file. Another collection method wouldn't have guaranteed that result.
A basic dataset description as shown in the score:
ds#: {source of dataset, Partition>>Collection, target of dataset)}
I don't recall exactly where partitioning/collection physically happens (either at the input or output of a dataset). I'm going to SWAG that partitioning happens at the source end of a dataset/link and collection happens at the target end. It generally isn't a separate process but is attached to another, although I won't promise that is always the case
Regards,