Hello,
To my understanding, (page 13) of the White Paper refers to a distribution strategy. You would always use a hash function for distribution as it randomly distribute data in buckets for parallel processing across nodes.
The paper, which I have to agree is a bit confusing on this part, probably advise to use a hash distribution over a natural key instead of an artificial surrogate key (auto increment identity). The reason would be, as stated in the synapse documentation link provided above :
"Since identical values always hash to the same distribution, SQL Analytics has built-in knowledge of the row locations. In dedicated SQL pool this knowledge is used to minimize data movement during queries, which improves query performance."
So instead of hashing an auto incremental surrogate key which acts just like a row number, it is better to hash a natural key, especially if your table has many duplicate data.
However, if your data is clean, I doubt it makes any difference. The key point is to use a column with many unique values for distribution, and which is not a date.
This doesn't affect the use of surrogate keys in the Data Vault, which can be auto increments surrogate key, MD5 hash surrogate key (not the same than the distribution hash referred in the paper) in the new Data Vault 2.0 approach, or natural keys. All three methods are possible.
Best,
Philippe