Address of mailbox given to GPU should have cacheability bits set correctly
On entry to HAL_SendHostMessage, we ensure the contents of the mailbox buffer are flushed out to the ARM L2 cache (if applicable) and main memory. There were a couple of instructions to fill in the top two bits of the address before passing it to the VC, but they were commented out for reasons that are not clear.
The effect of this is that the VC will look in its L1 and L2 caches for the data in the buffer. On Pi 1 and 0, this wouldn't be too bad, since ARM11 didn't have its own L2 cache and would have written the data into the VC L2 cache instead, meaning that there would only be coherencency problems if the VC L1 cache still contained the old contents of the address. On Pi 2-4, it's more risky, because the VC L2 cache could also be inconsistent with main memory at this point.
Reinstating the top two bits doesn't appear to cause any ill effects I can see (tested on Pi 1 and 4), so put these instructions back in.