If the incoming data are not sorted, or even if they are but you don't tell the Aggregator stage, then the Aggregator stage must keep the progressive set function results (sum, count, etc.) in little buckets in memory, one bucket for each combination of the grouping columns. The total size in memory is a function of the number of grouping columns, the number of aggregation fields, and the data type - and therefore size - of each.
If the incoming data are sorted (by grouping column(s)) and you assert this on the Aggregator's input link, then far less memory can be consumed and speed obtained.

This is because, whenever a value changes on one of the sorted columns, the previous value will never recur, so that the buckets used for that value (and anything sorted later) can be pushed out the output link and that memory freed for re-use. Indeed, only one bucket is needed for the column sorted first.
Normal "idiot proofing" means that a code check must occur that no row is out of order in a grouping column asserted to be sorted, because the more efficient algorithm would fail to generate correct results under these circumstances.
The above discussion ought, with a bit of thought, indicate why an Aggregator stage is intolerant of NULL.
The best way to understand the Aggregator stage is to do a bit of role playing. Get a whiteboard, or a large piece of paper, and actually pretend that you are the Aggregator stage doing its thing. For each row, you need a bucket containing each grouping column and each aggregating column. If you have no a priori knowledge about the data, you must keep adding buckets as new combinations of grouping column values arrive. If you do have knowledge that a particular groupng column value will not arrive again, then you can record the results from, then wipe off those buckets and re-use the whiteboard space.