Add myBX, myBLX and CallI, use BX/BLX if supported.
BLX pushes to the call-return stack so return branches can be predicted, which can potentially improve performance.
BX is allegedly faster on some CPUs.
Both are also required for inter-working with Thumb code before ARMv7.
When building for CPUs that do not support BX/BLX these macros use MOV instead.