Add 64-bit versions of clz and family
Where algorithms naturally use 64-bit quantities, these functions will lend themselves to even better optimisation under AArch64.
Also opt back in to the softload_gnu CI job, since this is one of the components for which it already passes.