Use a hash-based pick

- See TidyObsidian/find-duplicate-blocks.py

Use a hash-based pick: sort tokens by a stable hash (e.g., hash(token) or sha1(token)), then take the first N. This approximates MinHash and tends to distribute blocks more evenly across the token space.

This change shrinks `candidate_indices` per block, so the inner Jaccard loop runs far fewer times.