This brings down the wall-clock time down approximately proportionally
to the number of CPUs (or possibly even more, e.g. here 16 min 44 went
down to less then 1 min with 12 CPUs).
I added a loop on failure. Unfortunately sometimes the server returns
an invalid answer. That also happens when running serially, but it
seems to happen more often in parallelized mode.