Adapt clz (and family) to use CLZ instructions where available
Ideally I'd like to be able to define inline versions of these functions that insert inline assembler also, though that needs to be done conditionally on the earliest targeted CPU. To that end, I'd suggest the compiler is extended to predefine __ARM_FEATURE_CLZ
when the -cpu
switch indicates Armv5 or higher, although that is not required for this version of this MR.
Also enable GitLab CI while we're at it.