diff --git a/Docs/HAL/ARMop_API b/Docs/HAL/ARMop_API
new file mode 100644
index 0000000000000000000000000000000000000000..dc1a7d897fd4ed1ab0280236795523b6a45160b7
--- /dev/null
+++ b/Docs/HAL/ARMop_API
@@ -0,0 +1,440 @@
+12345678901234567890123456789012345678901234567890123456789012345678901234567890
+
+mjs   12 Jan 2001   Early Draft
+
+
+RISC OS Kernel ARM core support
+===============================
+
+This document is concerned with the design of open ended support for
+multiple ARM cores within the RISC OS kernel, as part of the work loosely
+termed hardware abstraction. Note that the ARM core support is part of the
+OS kernel, and so is not part of the hardware abstraction layer (HAL)
+itself.
+
+Background
+----------
+
+ARM core support (including caches and MMU) has historically been coded in a
+tailored way for one or two specific variants. Since version 3.7 this has
+meant just two variants; ARM 6/7 and StrongARM SA110. A more generic
+approach is required for the next generation. This aims both to support
+several cores in a more structured way, and to cover minor variants (eg.
+cache size) with the same support code. The natural approach is to set up
+run-time vectors to a set of ARM support routines.
+
+Note that it is currently assumed that the ARM MMU architecture will not
+change radically in future ARM cores. Hence, the kernel memory management
+algorithms remain largely unchanged. This is believed to be a reasonable
+assumption, since the last major memory management change was with Risc PC
+and ARM 610 (when the on-chip MMU was introduced).
+
+Note that all ARM support code must be 32-bit clean, as part of the 32-bit
+clean kernel.
+
+Survey of ARM core requirements
+-------------------------------
+
+At present, five broad ARM core types can be considered to be of interest;
+ARM7 (and ARM6), ARM9, ARM10, StrongARM (SA1) and  XScale. These divide
+primarily in terms of cache types, and cache and TLB maintenance
+requirements. They also span a range of defined ARM architecture variants,
+which introduced variants for system operations (primarily coprocessor 15
+instructions).
+
+The current ARM architecture is version 5. This (and version 4) has some
+open ended definitions to allow code to determine cache size and types from
+CP15 registers. Hence, the design of the support code can hope to be at
+least tolerant of near future variations that are introduced.
+
+ARM7
+----
+
+ARM7 cores may be architecture 3 or 4. They differ in required coprocessor
+15 operations for the same cache and TLB control. ARM6 cores are much the
+same as architecture 3 ARM7. The general character of all these cores is of
+unified write-through caches that can only be invalidated on a global basis.
+The TLBs are also unified, and can be invalidated per entry or globally.
+
+ARM9
+----
+
+ARM9 cores are architecture 4. We ignore ARM9 variants without an MMU. The
+kernel can read cache size and features. The ARM 920 or 922 have harvard
+caches, with writeback and writethrough capable data caches (on a page or
+section granularity). Data and instruction caches can be invalidated by
+individual lines or globally. The data cache can be cleaned by virtual
+address or cache segment/index, allowing for efficient cache maintenance.
+Data and instruction TLBs can be invalidated by entry or globally.
+
+ARM10
+-----
+
+ARM 10 is architecture 5. Few details available at present. Likely to be
+similar to ARM9 in terms of cache features and available operations. 
+
+StrongARM
+---------
+
+StrongARM is architecture 4. StrongARMs have harvard caches, the data cache
+being writeback only (no writethrough option). The data cache can only be
+globally cleaned in an indirect manner, by reading from otherwise unused
+address space. This is inefficient because it requires external (to the
+core) reads on the bus. In particular, the minimum cost of a clean, for a
+nearly clean cache, is high. The data cache supports clean and invalidate by
+individual virtual lines, so this is reasonably efficient for small ranges
+of address. The data TLB can be invalidated by entry or globally.
+
+The instruction cache can only be invalidated globally. This is inefficient
+for cases such as IMBs over a small range (dynamic code). The instruction
+TLB can only be invalidated globally.
+
+Some StrongARM variants have a mini data cache. This is selected over the
+main cache on a section or page by using the cachable/bufferable bits set to
+C=1,B=0 in the MMU (this is not standard ARM architecture). The mini data
+cache is writeback and must be cleaned in the same manner as the main data
+cache.
+
+XScale
+------
+
+XScale is architecture 5. It implements harvard caches, the data cache being
+writeback or writethrough (on a page or section granularity). Data and
+instruction caches can be invalidated by individual lines or globally. The
+data cache can be fully cleaned by allocating lines from otherwise unused
+address space. Unlike StrongARM, no external reads are needed for the clean
+operation, so that cache maintenance is efficient.
+
+XScale has a mini data cache. This is only available by using extension bits
+in the MMU. This extension is not documented in the current manual for
+architecture 5, but will presumably be properly recognised by ARM. It should
+be a reasonably straightforward extension for RISC OS. The mini data cache
+can only be cleaned by inefficient indirect reads as on StrongARM. However,
+for XScale, the whole mini data cache can be configured as writethrough to
+obviate this problem. The most likely use for RISC OS is to map screen
+memory as mini cacheable, when writethrough caching will also be highly
+desirable to prevent delayed screen update.
+
+The instruction and data TLBs can each be invalidated by entry or globally.
+
+
+Kernel ARM operations
+---------------------
+
+This section lists the definitions and API of the set of ARM operations
+required by the kernel for each major ARM type that is to be supported. Some
+operations may be very simple on some ARMs. Others may need support from the
+kernel environment - for example, readable parameters that have been
+determined at boot, or address space available for cache clean operations.
+
+The general rules for register usage and preservation in calling these
+operations is:
+
+  - any parameters are passed in r0,r1 etc. as required
+  - r0 may be used as a scratch register
+  - the routines see a valid stack via sp, at least 16 words are available
+  - lr is the return link as required
+  - on exit, all registers except r0 and lr must be preserved
+
+Note that where register values are given as logical addresses, these are
+RISC OS logical addresses. The equivalent ARM terminology is virtual address
+(VA), or modified virtual address (MVA) for architectures with the fast
+context switch extension.
+
+Note also that where cache invalidation is required, it is implicit that any
+associated operations for a particular ARM should be performed also. The
+most obvious example is for an ARM with branch prediction, where it may be
+necessary to invalidate a branch cache anywhere where instruction cache
+invalidation is to be performed.
+
+Any operation that is a null operation on the given ARM should be
+implemented as a single return instruction:
+
+  MOV pc, lr
+
+
+-- Cache_CleanInvalidateAll
+
+The cache or caches are to be globally invalidated, with cleaning of any
+writeback data being properly performed. 
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+Note that any write buffer draining should also be performed by this
+operation, so that memory is fully updated with respect to any writeaback
+data.
+
+The OS only expects the invalidation to be with respect to instructions/data
+that are not involved in any currently active interrupts. In other words, it
+is expected and desirable that interrupts remain enabled during any extended
+clean operation, in order to avoid impact on interrupt latency.
+
+-- Cache_CleanAll
+
+The unified cache or data cache are to be globally cleaned (any writeback data
+updated to memory). Invalidation is not required.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+Note that any write buffer draining should also be performed by this
+operation, so that memory is fully updated with respect to any writeaback
+data.
+
+The OS only expects the cleaning to be with respect to data that are not
+involved in any currently active interrupts. In other words, it is expected
+and desirable that interrupts remain enabled during any extended clean
+operation, in order to avoid impact on interrupt latency.
+
+-- Cache_InvalidateAll
+
+The cache or caches are to be globally invalidated. Cleaning of any writeback
+data is not to be performed.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+This call is only required for special restart use, since it implies that
+any writeback data are either irrelevant or not valid. It should be a very
+simple operation on all ARMs.
+
+-- Cache_RangeThreshold
+
+Return a threshold value for an address range, above which it is advisable
+to globally clean and/or invalidate caches, for performance reasons. For a
+range less than or equal to the threshold, a ranged cache operation is
+recommended.
+
+   entry: -
+   exit:  r0 = threshold value (bytes)
+
+   IRQs are enabled
+   call is not reentrant
+
+This call returns a value that the kernel may use to select between strategies
+in some cache operations. This threshold may also be of use to some of the
+ARM operations themselves (although they should typically be able to read
+the parameter more directly).
+
+The exact value is unlikely to be critical, but a sensible value may depend
+on both the ARM and external factors such as memory bus speed.
+
+
+-- TLB_InvalidateAll
+
+The TLB or TLBs are to be globally invalidated.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+-- TLB_InvalidateEntry
+
+The TLB or TLBs are to be invalidated for the entry at the given logical
+address.
+
+   entry: r0 = logical address of entry to invalidate (page aligned)
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The address will always be page aligned (4k).
+
+-- WriteBuffer_Drain
+
+Any writebuffers are to be drained so that any pending writes are guaranteed
+completed to memory.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+-- IMB_Full
+
+A global instruction memory barrier (IMB) is to be performed.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+An IMB is an operation that should be performed after new instructions have
+been stored and before they are executed. It guarantees correct operation
+for code modification (eg. something as simple as loading code to be
+executed).
+
+On some ARMs, this operation may be null. On ARMs with harvard architecture
+this typically consists of:
+
+  1) clean data cache
+  2) drain write buffer
+  3) invalidate instruction cache
+
+There may be other considerations such as invalidating branch prediction
+caches.
+
+-- IMB_Range
+
+An instruction memory barrier (IMB) is to be performed over a logical
+address range.
+
+   entry: r0 = logical address of start of range
+          r1 = logical address of end of range (exclusive)
+          Note that r0 and r1 are aligned on cache line boundaries
+   exit: -
+
+   IRQs are enabled
+   call is not reentrant
+
+An IMB is an operation that should be performed after new instructions have
+been stored and before they are executed. It guarantees correct operation
+for code modification (eg. something as simple as loading code to be
+executed).
+
+On some ARMs, this operation may be null. On ARMs with harvard architecture
+this typically consists of:
+
+  1) clean data cache over the range
+  2) drain write buffer
+  3) invalidate instruction cache over the range
+
+There may be other considerations such as invalidating branch prediction
+caches.
+
+Note that the range may be very large. The implementation of this call is
+typically expected to use a threshold (related to Cache_RangeThreshold) to
+decide when to perform IMB_Full instead, being faster for large ranges.
+
+-- MMU_Changing
+
+The global MMU mapping is about to be changed.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically perform the following:
+
+  1) globally clean and invalidate all caches
+  2) drain write buffer
+  3) globally invalidate TLB or TLBs
+
+Note that it should not be necessary to disable IRQs. The OS ensures that
+remappings do not affect currently active interrupts.
+
+-- MMU_ChangingEntry
+
+The MMU mapping is about to be changed for a single page entry (4k).
+
+   entry: r0 = logical address of entry (page aligned)
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically perform the following:
+
+  1) clean and invalidate all caches over the 4k range of the page
+  2) drain write buffer
+  3) invalidate TLB or TLBs for the entry
+
+Note that it should not be necessary to disable IRQs. The OS ensures that
+remappings do not affect currently active interrupts.
+
+-- MMU_ChangingUncached
+
+The MMU mapping is about to be changed in a way that globally affects
+uncacheable space.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically globally invalidate the TLB or TLBs. The OS
+guarantees that cacheable space is not affected, so cache operations are not
+required. However, there may still be considerations such as fill buffers
+that operate in uncacheable space on some ARMs.
+
+-- MMU_ChangingUncachedEntry
+
+The MMU mapping is about to be changed for a single uncacheable page entry
+(4k).
+
+   entry: r0 = logical address of entry (page aligned)
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically invalidate the TLB or TLBs for the entry. The
+OS guarantees that cacheable space is not affected, so cache operations are
+not required. However, there may still be considerations such as fill
+buffers that operate in uncacheable space on some ARMs.
+
+
+-- MMU_ChangingEntries
+
+The MMU mapping is about to be changed for a contiguous range of page
+entries (multiple of 4k).
+
+   entry: r0 = logical address of first page entry (page aligned)
+          r1 = number of page entries ( >= 1)
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically perform the following:
+
+  1) clean and invalidate all caches over the range of the pages
+  2) drain write buffer
+  3) invalidate TLB or TLBs over the range of the entries
+
+Note that it should not be necessary to disable IRQs. The OS ensures that
+remappings do not affect currently active interrupts.
+
+Note that the number of entries may be large. The operation is typically
+expected to use a reasonable threshold, above which it performs a global
+operation instead for speed reasons.
+
+-- MMU_ChangingUncachedEntries
+
+The MMU mapping is about to be changed for a contiguous range of uncacheable
+page entries (multiple of 4k).
+
+   entry: r0 = logical address of first page entry (page aligned)
+          r1 = number of page entries ( >= 1)
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically invalidate the TLB or TLBs over the range of
+the entries. The OS guarantees that cacheable space is not affected, so
+cache operations are not required. However, there may still be
+considerations such as fill buffers that operate in uncacheable space on
+some ARMs.
+
+Note that the number of entries may be large. The operation is typically
+expected to use a reasonable threshold, above which it performs a global
+operation instead for speed reasons.
diff --git a/Docs/HAL/HAL_API b/Docs/HAL/HAL_API
new file mode 100644
index 0000000000000000000000000000000000000000..45526eb194e85756a7daf3e035bf91dc58e0c57f
--- /dev/null
+++ b/Docs/HAL/HAL_API
@@ -0,0 +1,1054 @@
+12345678901234567890123456789012345678901234567890123456789012345678901234567890
+
+2001 - a HAL API
+----------------
+
+mjs   12 Jan 2001   Early Draft  (mjs,kjb)
+
+
+RISC OS Hardware Abstraction
+============================
+
+Background
+----------
+
+This document is concerned with low level developments of RISC OS in order
+to support future ARM based platforms. Loosely, this has been considered as
+creating a hardware abstraction layer, or HAL. This term is a useful
+shorthand, but with the following caveats. Firstly, the HAL work is only
+envisaged to provide a modest level of low level abstraction (at least for
+the next OS generation). Secondly, significant non-HAL work, at all levels,
+is required to make a useful next generation RISC OS.
+
+Note that most of the hardware dependence of the OS is already confined to
+low level code (essentially, the kernel and device drivers). Here we assume
+that the OS is only expected to run on an ARM processor, and with somewhat
+restricted choices of I/O hardware (eg. friendly video pixel formats).
+
+Up to now (version 4), RISC OS has evolved while closely coupled to an ARM
+processor core and to an Acorn proprietary chip set (video, memory, I/O). It
+has remained highly hardware specific. For the purposes of further
+investment in RISC OS, three key areas of hardware dependence must be
+addressed; 32-bit clean operation, support for new ARM cores, support for
+various video, memory and I/O configurations. Without all of these, the OS
+is essentially useless on forseeable future hardware.
+
+
+32-bit clean code
+-----------------
+
+All RISC OS code must run 32-bit clean on future releases. This is because
+all ARM cores from ARM9 onwards (and also some ARM7 variants) have entirely
+removed support for RISC OS's native 26-bit modes. Note that 32-bit clean
+code is not precluded from working on the older ARM cores (back to ARM 610).
+
+With more care, 32/26-bit agnostic code can be written to work back to ARM
+2. This may be of interest to module and application code, but note that the
+OS kernel itself is only expected to work back to ARM 610, since an MMU is
+required.
+
+Much of the work required is routine and has been done for the OS itself
+(though long term weeding of consequent bugs is required). A 32-bit
+compatible shared C library has been released in order to encourage
+conversion of application code by third parties. This work is not part of
+hardware abstraction and is not considered further in this document.
+
+Support for newer ARM cores
+---------------------------
+
+ARM core support (including caches and MMU) has historically been coded in a
+tailored way for one or two specific variants. Since version 3.7 this has
+meant just two variants; ARM 6/7 and StrongARM SA110. A more generic
+approach is required for the next generation. This aims both to support
+several cores in a more structured way, and to cover minor variants (eg.
+cache size) with the same support code. The natural approach is to set up
+run-time vectors to a set of ARM support routines.
+
+Note that it is currently assumed that the ARM MMU architecture will not
+change radically in future ARM cores. Hence, the kernel memory management
+algorithms remain largely unchanged. This is believed to be a reasonable
+assumption, since the last major memory management change was with Risc PC
+and ARM 610 (when the on-chip MMU was introduced).
+
+ARM core support is confined almost entirely to the kernel, and is therefore
+not strictly part of the HAL. The HAL will only be concerned with any
+external factors such as clock selection. Only HAL aspects are considered
+further in this document.
+
+Hardware abstraction layer
+--------------------------
+
+A simple HAL is to be inserted underneath RISC OS. This will provide two
+functions. Firstly, it will be responsible for initial system bootstrap,
+much like a PC BIOS, and secondly it will provide simple APIs to allow
+hardware access.
+
+The HAL APIs are a thin veneer on top of the hardware. They are designed to
+act as replacements for all the hardware knowledge and manipulation
+performed by the RISC OS Kernel, together with some APIs that will allow
+RISC OS driver modules to become more hardware independent. No attempt will
+be made (at this stage) to perform such tasks as separating the video
+drivers from the Kernel, for example.
+
+One tricky design decision is the amount of abstraction to aim for. Too
+little, and the system is not flexible enough; too much and HAL design is
+needlessly complicated for simple hardware. The present design tries to err
+on the side of too little abstraction. Extra, more abstract APIs can always
+be added later. So, initially, for example, the serial device API will just
+provide discovery, some capability flags and the base address of the UART
+register set. This will be sufficient for the vast majority of devices. If
+new hardware comes along later that isn't UART compatible, a new API can be
+defined. Simple hardware can continue to just report UART base addresses.
+
+The bulk of device driver implementation remains in RISC OS modules - the
+difference is that the HAL will allow many device drivers to avoid direct
+access to hardware. For example, PS2Driver can now use HAL calls to send and
+receive bytes through the PS/2 ports, and thus is no longer tied to IOMD's
+PS/2 hardware. Similarly, interrupt masking and unmasking, as performed by
+any device vector claimant, is now a HAL call. Note that HAL calls are
+normally performed via a Kernel SWI - alternatively the Kernel can return
+the address of specific HAL routines. There is nothing to stop specific
+drivers talking to hardware directly, as long as they accept that this will
+tie them to specific devices.
+
+This dividing line between the HAL and RISC OS driver modules is crucial. If
+the HAL does everything, then we have achieved nothing - we have just as
+much hardware dependent code - it's just in a different place. It is
+important to place the dividing line as close to the hardware as possible,
+to make it easy to design a HAL and to prevent large amounts of code 
+duplication between HALs for different platforms.
+
+The Kernel remains responsible for the ARM's MMU and all other aspects of
+the CPU core. The HAL requires no knowledge of details of ARM
+implementations, and thus any HAL implementation should work on any
+processor from the ARM610 onwards.
+
+
+HAL/OS layout and headers
+-------------------------
+
+The OS is linked to run at a particular base address. Pre-HAL OS's were
+linked to run at <n>MB, that is on a MB alignment to allow efficient MMU
+section mapping. For simplicity, the HAL/OS layout can allow a fixed maximum
+size for the HAL, currently set at 64k. Then the OS base address will be
+<n>MB+64k. This allows a HAL of up to 64K to be placed at the bottom of a
+ROM below the OS, and the HAL/OS combination to still be section-mapped. A
+ROM should be portable to hardware variants merely by replacing the 64k HAL
+block.
+
+A more flexible system would only sacrifice MMU mapping efficiency. The HAL
+and OS could be placed in any desired way, provided that each is contiguous
+in physical memory.
+
+The OS starts with a header including a magic word - this aids probing and
+location of images. The OS header format is defined as:
+
+Word 0: Magic word ("OSIm" - &6D49534F)
+Word 1: Flags (currently should be 0)
+Word 2: Image size (bytes)
+Word 3: Offset (bytes) from OS base to table of OS routine entry points
+Word 4: Number of entries in table
+
+The HAL itself should have whatever header is required to start the system.
+For example on ARM7500 16->32 bit switch code is required, and on the
+9500 parts a special ROM header and checksum must be present. A HAL
+descriptor block, instead of a header, can be placed somewhere in the HAL. A
+pointer to this block is passed by the HAL to the OS in the OS_Start call:
+
+Word 0: Flags (currently should be 0)
+Word 1: Offset (bytes) from descriptor to start of HAL (will be <= 0)
+Word 2: HAL size (bytes)
+Word 3: Offset (bytes) from descriptor to table of HAL routine entry points
+Word 4: Number of entries in table
+Word 5: Size of HAL static workspace required (bytes)
+
+Calling standards
+-----------------
+
+RISC OS and the HAL are two separate entities, potentially linked
+separately. The OS and the HAL are each defined with a set of callable
+routines for the OS/HAL interface. Each HAL entry or each OS entry is given
+a unique (arbitrary) number, starting at 0. The offset to each entry is
+given in an entry table. Calls can be made manually through this table, or
+stubs could be created at run-time to allow high-level language calls.
+
+Every entry (up to the declared maximum) must exist. If not implemented, a
+failure response must be returned, or the call ignored, as appropriate. Note
+that the OS interface for the HAL should not be confused with standard OS
+calls (SWIs) already defined for use in the OS itself.
+
+To permit high-level language use in the future, the procedure call standard
+in both directions is the ARM-Thumb Procedure Call Standard (ATPCS) as
+defined by ARM, with no use of floating point, no stack limit checking, no
+frame pointers, and no Thumb interworking. HAL code is expected to be ROPI
+and RWPI (ie. all its read-only segments and read-write segments are
+position-independent). Hence the HAL is called with its static workspace
+base (sb) in r9. The OS kernel is neither ROPI nor RWPI (except for the
+pre-MMU calls, which are ROPI). OS calls from the HAL do not use r9 as a
+static base.
+
+The HAL will always be called in a privileged mode - if called in an
+interrupt mode, the corresponding interrupts will be disabled. The HAL
+should not change mode. HAL code should work in both 26-bit and 32-bit modes
+(but should assume 32-bit configuration).
+
+Routines can be conveniently specified in C language syntax. Typically they
+will be written in assembler. In detail, the ATPCS register usage for HAL
+calls is as follows:
+
+  ATPCS  ARM    use                        at exit
+  a1     r0     argument 1/return value    undefined or return value
+  a2     r1     argument 2/return value    undefined or return value
+  a3     r2     argument 3/return value    undefined or return value
+  a4     r3     argument 4/return value    undefined or return value
+  v1     r4     var 1                      preserved
+  v2     r5     var 2                      preserved
+  v3     r6     var 3                      preserved
+  v4     r7     var 4                      preserved
+  v5     r8     var 5                      preserved
+  sb     r9     static workspace base      preserved
+  v7     r10    var 7                      preserved
+  v8     r11    var 8                      preserved
+  ip     r12    scratch                    undefined
+  sp     r13    stack pointer              preserved
+  lr     r14    return link                undefined
+
+The static workspace base points to the HAL workspace.
+
+Note that HAL calls must be assumed to corrupt all of r0-r3,r12,r14. A 
+function return value may be in r0, or (less commonly) multiple return
+words in two or more of r0-r3.
+
+If there are more than 4 arguments to a HAL call, arguments 5 onwards must
+be pushed onto the stack  before the call, and discarded after return. (The
+order of arguments is with argument 5 at top of stack, ie. first to be
+pulled.)
+
+The register usage for the OS entry points is the same, except that r9 is
+not used as a static base (it is preserved).
+
+When using assembler, the register usage may seem somewhat restricted, and
+cumbersome for more than 4 arguments. However, it is typically a reasonable
+balance for function calls (as a PCS would aim to be), and does not preclude
+implementation in C for example. Old kernel code may require register
+preserving overhead to insert HAL calls easily, but for most calls this is
+insignificant, compared to hardware access costs.
+
+Initialisation sequence
+-----------------------
+
+After system reset, bootstrap code in the HAL will do minimal hardware
+set-up ... blah blah
+
+HAL entry points
+----------------
+
+These routines are expected to be called from the OS (Kernel). See the
+'Calling standards' section for general information on register usage and so
+forth.
+
+Interrupts
+----------
+
+The HAL must provide the ability to identify, prioritise and mask IRQs, and the ability
+to mask FIQs. RISC OS supplies the ARM's processor vectors, and on an IRQ calls the HAL
+to request the identity of the highest priority interrupt.
+
+IRQ and FIQ device numbers are arbitrary, varying from system to system. They should be
+arranged to allow quick mappings to and from hardware registers, and should ideally
+be packed, starting at 0.
+
+Timers
+------
+
+The HAL must supply at least one timer capable of generating periodic
+interrupts. Each timer should generate a separate logical interrupt, and the
+interrupt must be latched. The timers must either be variable rate (period is
+a multiple of a basic granularity), or be fixed rate (period = 1*granularity).
+Optionally, the timer should be capable of reporting the time until the
+next interrupt, in units of the granularity.
+
+Counter
+-------
+
+The HAL must supply a counter that varies rapidly, appropriate for use for
+sub-millisecond timing. On many systems, this counter will form part of
+timer 0 - as such it is not required to operate when timer 0 is not running.
+On other systems, the periodic timers may have no readable latch, and a
+separate unit will be required.
+
+The counter should count down from (period-1) to 0 continuously.
+
+Non-volatile memory
+-------------------
+
+The HAL should provide at least 240 bytes of non-volatile memory. If no
+non-volatile memory is available, the HAL may provide fake NVRAM contents
+suitable for RISC OS - however, it is preferable that the HAL just state
+that NVRAM is not available, and RISC OS will act as though a CMOS reset has
+been performed every reset.
+
+NVRAM is typically implemented as an IIC device, so the calls are permitted
+to be slow, and to enable interrupts. The HAL is not expected to cache
+contents.
+
+If the HAL has no particular knowledge of NVMemory, then it may just say
+that "NVMemory is on IIC", and the OS will probe for CMOS/EEPROM devices on
+the IIC bus.
+
+IIC bus
+-------
+
+Many hardware designs have an IIC bus. Often, it is used only to support
+non-volatile memory, but in other systems TV tuners, TV modulators,
+microcontrollers, and arbitrary expansion cards may be fitted.
+
+Low-level and high level APIs are defined. An arbitrary number of buses is
+supported, and each can be controlled by either the low or high level API.
+The OS should normally only use one fixed API on each bus - mixing APIs is
+unpredictable.
+
+The low-level API requires the OS to control the two lines of the bus
+directly. The high-level API currently covers version 2.1 of the IIC
+protocol, and allows high-level transactions to be performed.
+
+It is expected that a HAL will always provide the low-level API on each bus,
+where possible in hardware. Using this, the OS can provide Fast mode single
+or multi-master operation. The HAL may wish to provide the high-level API
+where a dedicated IIC port with hardware assistance is available; this will
+further permit High-speed and slave operation.
+
+As it is possible that some HAL APIs (eg NVMemory), although abstracted at
+this API layer, are still actually an IIC device, a matching set of
+high-level IIC calls are provided in the OS. These give the HAL access to
+the OS IIC engine, which will make low-level HAL calls. This saves the HAL
+from implementing the full IIC protocol. To illustrate this diagramatically:
+
+    +----------+ NVMem_Read +------------+  NVMemoryRead  +------------+
+    |          | ---------> |            | ------------>  |            |
+    |   App    |            |     OS     |  IICTransmit   |    HAL     |
+    |          |            |            | <------------  |            |
+    |          |            |            |  IICSetLines   |            |
+    |          |            |            | ------------>  |            |
+    +----------+            +------------+                +------------+
+
+The low-level calls should be fast. Interrupt status may not be altered.
+
+The following structure is used:
+
+   typedef struct { int SDA, SCL } IICLines;
+
+High level API to be defined ...
+
+Video
+-----
+
+The HAL only attempts to abstract the hardware controller aspects of the OS
+video. It does not (yet) consider pixel formats, framestore layout, hardware
+graphics acceleration. All these would affect a great deal of RISC OS
+graphics code that forms much of the value of the OS. This means that the
+envisaged HAL/RISC OS combination makes some specific assumptions about
+graphics framestore layout as follows:
+
+ - memory mapped framestore
+ - expected to be contiguous physical memory, can be specific memory (eg. VRAM) 
+ - mapped as contiguous logical memory
+ - progressive raster scan in logical memory from top left pixel to bottom right
+ - start of each raster row must be word aligned
+ - number of pixels in a row should be such that row is a whole number of words
+ - spacing between start of each row is a constant number of words, possibly
+   greater than row length (via mode variable, LineLength)
+ - 1,2,4,8,16 or 32 bits per pixel (bpp)
+ - little endian pixel packing for 1,2,4 bpp (least significant bits are
+   leftmost pixels)
+ - presence of palette assumed for 1,2,4,8 bpp (8-bits per r,g,b component in
+   each entry)
+ - 16 bpp format:
+     bits 0-4       Red
+          5-9       Green
+          10-14     Blue
+          15        Supremacy (0=solid, 1=transparent)
+ - 32 bpp format:
+     bits 0-7       Red
+          8-15      Green
+          16-23     Blue
+          24-31     Supremacy (0=solid, 255=transparent)
+ - palette words are 32 bits: 
+     bits 0-7       Reserved (0), or Supremacy (0=solid, 255=transparent)
+          8-15      Red
+          16-23     Green
+          24-31     Blue
+ - pointer/cursor is assumed supported in hardware, 32x32 pixels,
+   each pixel either transparent or one of 3 paletted colours
+ - support for physically interlaced, logically progressive framestore via
+   MMU tricks and use of LineLength mode variable, currently not fully
+   integrated into kernel
+
+Note that it is possible to support hardware where only some pixel depths
+are available, or only some fit the RISC OS assumptions. Also some hardware
+has some configurability for 'arbitrary' choices like RGB versus BGR
+ordering. Hence, the restrictions are typically much less severe than might
+first be thought.
+
+Supporting a software only pointer/cursor is feasible (much less work than
+new pixel formats) but not yet considered.
+
+Aside: RISC OS video interlace trick
+------------------------------------
+
+Has been used in NC/STB variants. Makes a physically interlaced framestore
+(two distinct field stores) appear as logically progressive framestore,
+using MMU to map many logical copies, and using freedom to choose a constant
+logical increment between rows in RO mode definition. For 576 rows say, uses
+576M of logical space. Each 1M (section mapped) supports a row and allows
+logical address to increment monotonically, as physical address alternates
+between (increasing rows of) physical field stores. Currently not integrated
+into kernel, so fudges address space allocation and poking of video
+variables. Also has drawback of thrashing data TLBs (one entry per row).
+
+The trick requires the physical field stores to be separated by 1M plus half
+a row. The logical spacing between rows is also set to 1M plus half a row.
+The 1M logical sections are set to map alternately to the even and odd
+physical fields (the second field being offset by half a row relative to 1M
+alignment). Then the logical incrementing of rows maps alternately between
+fields, incrementing physically by 1 row between visits to the same field.
+Note that the multiple logical mapping implies uncached screen to avoid
+coherency worries, but RO uses uncached screen anyway (with exception of
+Ursula/Phoebe, now defunct). 
+
+
+Routines in detail
+------------------
+
+[Note, plonking all routines here possibly only temporarily. May want
+routines listed in relevant sections with overview. eg. video routines
+with video section, etc.]
+
+-- HAL_Init(unsigned int *riscos_header)
+
+The OS will call HAL_Init after enabling the MMU, and initialising the HAL
+workspace (filled with 0). At this point any initialisation for the main HAL
+routines (rather than the early bootstrap code in the HAL) can be done.
+
+-- HAL_IRQEnable
+
+????
+
+-- HAL_IRQDisable
+
+????
+
+-- HAL_IRQClear
+
+????
+
+-- HAL_IRQSource
+
+????
+
+-- int HAL_Timers(void)
+
+Returns number of timers. Timers are numbered from 0 upwards. Timer 0 must
+exist.
+
+-- int HAL_TimerDevice(int timer)
+
+Returns device number of timer n. A device number refers to the IRQ device
+number for interrupt calls.
+
+-- unsigned int HAL_TimerGranularity(int timer)
+
+Returns basic granularity of timer n in ticks per second.
+
+-- unsigned int HAL_TimerMaxPeriod(int timer)
+
+Returns maximum period of the timer, in units of Granularity. Will be 1 for
+a fixed rate timer.
+
+-- void HAL_TimerSetPeriod(int timer, unsigned int period)
+
+Sets period of timer n. If period > 0, the timer will generate interrupts
+every (period / granularity) seconds. If period = 0, the timer may be
+stopped. This may not be possible on some hardware, so the corresponding
+interrupt should be masked in addition to calling this function with period
+0. If period > maxperiod, behaviour is undefined.
+
+-- unsigned int HAL_TimerPeriod(int timer)
+
+Reads period of timer n. This should be the actual period in use by the
+hardware, so if for example period 0 was requested and impossible, the
+actual current period should be reported.
+
+-- unsigned int HAL_TimerReadCountdown(int timer)
+
+Returns the time until the next interrupt in units of granularity, rounded
+down. If not available, 0 is returned.
+
+-- unsigned int HAL_CounterRate(void)
+
+Returns the rate of the counter in ticks per second. Typically will equal
+HAL_TimerGranularity(0).
+
+-- unsigned int HAL_CounterPeriod(void)
+
+Returns the period of the counter, in ticks. Typically will equal
+HAL_TimerPeriod(0).
+
+-- unsigned int HAL_CounterRead(void)
+
+Reads the current counter value. Typically will equal
+HAL_TimerReadCountdown(0).
+
+-- unsigned void HAL_CounterDelay(unsigned int microseconds)
+
+Delay for at least the specified number of microseconds.
+
+-- unsigned int HAL_NVMemoryType(void)
+
+Returns a flags word describing the NVMemory
+      bits 0-7: 0 => no NVMemory available
+                1 => NVMemory may be available on the IIC bus
+                2 => NVMemory is available on the IIC bus, and the
+                     device characteristics are known
+                3 => the HAL provides NVMemory access calls.
+      bit 8:    NVMemory has a protected region at the end
+      bit 9:    Protected region is software deprotectable
+      bit 10:   Memory locations 0-15 are readable
+      bit 11:   Memory locations 0-15 are writeable
+
+If bits 0-7 are 0 or 1 no other NVMemory calls need be available, and bits
+8-31 should be zero.
+
+If bits 0-7 are 2, Size, ProtectedSize, Protection and IICAddress calls must
+be available.
+
+If bits 0-7 are 3, all calls except IICAddress must be available.
+
+-- unsigned int HAL_NVMemorySize(void)
+
+Returns the number of bytes of non-volatile memory available. Bytes 0-15
+should be included in the count, so for example a Philips PCF8583 CMOS/RTC
+device (as used in the Archimedes and Risc PC) would be described as a
+256-byte device, with locations 0-15 not readable. More complex arrangements
+would have to be abstracted out by the HAL providing its own NVMemory access
+calls.
+
+This is to suit the current RISC OS Kernel, which does not use bytes 0-15.
+
+-- unsigned int HAL_NVMemoryProtectedSize(void)
+
+Returns the number of bytes of NVMemory that are protected. These should be
+at the top of the address space. The OS will not attempt to write to those
+locations without first requesting deprotection (if available). Returns 0 if
+bit 8 of the flags is clear.
+
+-- void HAL_NVMemoryProtection(bool)
+
+Enables (if true) or disables if (false) the protection of the software
+protectable region. Does nothing if bits 8 and 9 not both set.
+
+-- unsigned int HAL_NVMemoryIICAddress(void)
+
+Returns a word describing the addressing scheme of the NVRAM.
+      bits 0-7:  IIC address
+       
+This will always be on bus zero.
+
+-- int HAL_NVMemoryRead(unsigned int addr, void *buffer, unsigned int n)
+
+Reads n bytes of memory from address addr onwards into the buffer supplied.
+Returns the number of bytes successfully read. Under all normal
+circumstances the return value will be n - if it is not, a hardware failure
+is implied. Behaviour is undefined if the address range specified is outside
+the NVMemory, or inside bytes 0-15, if declared unavailable.
+
+-- int HAL_NVMemoryWrite(unsigned int addr, void *buffer, unsigned int n)
+
+Write n bytes of memory into address addr onwards from the buffer supplied.
+Returns the number of bytes successfully written. Under all normal
+circumstances the return value will be n - if it is not, a hardware failure
+is implied. Behaviour is undefined if the address range specified is outside
+the NVMemory. Writes inside the a protected region should be ignored.
+
+-- int HAL_IICBuses(void)
+
+Returns the number of IIC buses on the system.
+
+-- unsigned int HAL_IICType(int bus)
+
+Returns a flag word describing the specified IIC bus.
+        bit 0: Bus supplies the low-level API
+        bit 1: Bus supplies the high-level API
+        bit 2: High-level API supports multi-master operation
+        bit 3: High-level API supports slave operation
+       bit 16: Bus supports Fast (400kbps) operation
+       bit 17: Bus supports High-speed (3.4Mbps) operation
+   bits 20-31: Version number of IIC supported by high-level API, * 100.
+
+
+-- __value_in_regs IICLines HAL_IICSetLines(int bus, IICLines lines)
+
+Sets the SDA and SCL lines on the specified bus. A 0 value represents logic
+LOW, 1 logic HIGH. The function then reads back and returns the values
+present on the bus, to permit arbitration.
+
+Note the "__value_in_regs" keyword, which signifies that the binary ABI
+expects SDA and SCL to be returned in registers a1 and a2.
+
+-- __value_in_regs IICLines HAL_IICReadLines(int bus)
+
+Reads the state of the IIC lines on the specified bus, without changing
+their state.
+
+Note the "__value_in_regs" keyword, which signifies that the binary ABI
+expects SDA and SCL to be returned in registers a1 and a2.
+
+-- int HAL_VideoFlybackDevice(void)
+
+Returns the device number of the video flyback interrupt. [Note: HAL
+interrupt API possibly subject to change, may affect this call.]
+
+-- void HAL_Video_SetMode(const void *VIDCList3)
+
+Programs the video controller to initialise a display mode. RISC OS passes a
+standard VIDC List Type 3 as specified in PRM 5a-125. Note that this is a
+generic video controller list, and so VIDC in this context does not refer to
+any specific devices such as Acorn VIDC20.
+
+The HAL is expected to set the video controller timings on this call. Any
+palette, pixel DMA and hardware cursor settings are controlled via other
+calls.
+
+-- void HAL_Video_WritePaletteEntry(uint type, uint pcolour, uint index)
+
+Writes a single palette entry to the video controller.
+
+  type     = 0 for normal palette entry
+             1 for border colour
+             2 for pointer colour
+          >= 3 reserved
+
+  pcolour  = palette entry colour in BBGGRRSS format (Blue,Green,Red,Supremacy)
+
+  index    = index of entry
+
+Indices are in the range 0..255 for normal, 0 for border, 0..3 for pointer
+colours. Note that RISC OS only makes calls using 1..3 for the pointer, and
+pointer colour 0 is assumed to be transparent.
+
+-- void HAL_Video_WritePaletteEntries(uint type, const uint *pcolours, 
+                                      uint index, uint Nentries)
+
+Writes a block of palette entries to the video controller.
+
+  type     = 0 for normal palette entry
+             1 for border colour
+             2 for pointer colour
+          >= 3 reserved
+
+  pcolours = pointer to block of palette entry colours in BBGGRRSS format
+             (Blue,Green,Red,Supremacy)
+
+  index    = start index in palette (for first entry in block)
+
+  Nentries = number of entries in block (must be >= 1)
+
+Indices are in the range 0..255 for normal, 0 for border, 0..3 for pointer
+colours. Note that RISC OS only makes calls using 1..3 for the pointer, and
+pointer colour 0 is assumed to be transparent.
+
+-- uint HAL_Video_ReadPaletteEntry(uint type, uint pcolour, uint index)
+
+Returns the effective palette entry after taking into account any hardware
+restrictions in the video controller, assuming it was originally programmed
+with the value pcolour.
+
+  type     = 0 for normal palette entry
+             1 for border colour
+             2 for pointer colour
+          >= 3 reserved
+
+  pcolour  = palette entry colour in BBGGRRSS format (Blue,Green,Red,Supremacy)
+
+  index    = index of entry
+
+  returns  : effective BBGGRRSS
+
+Indices are in the range 0..255 for normal, 0 for border, 0..3 for pointer
+colours. Note that RISC OS only makes calls using 1..3 for the pointer, and
+pointer colour 0 is assumed to be transparent.
+
+Depending on harwdware capabilities, HALs may have to remember current
+settings (eg. bits per pixel) or keep soft copies of entries. Because this
+call supplies the original pcolour, this need is minimised (some HALs can
+just return pcolour or a directly modified pcolour).
+
+-- void HAL_Video_SetInterlace(uint interlace)
+
+Sets the video interlaced sync.
+
+  interlace = 0 or 1 for interlace off or on
+              (all other values reserved)
+
+-- void HAL_Video_SetBlank(uint blank, uint DPMS)
+
+  blank = 0 or 1 for unblank or blank
+          (all other values reserved)
+
+  DMPS  = 0..3 as specified by monitor DPMSState (from mode file)
+          0 for no DPMS power saving
+
+The HAL is expected to attempt to turn syncs off according to DPMS, and to
+turn video DMA off for blank (and therefore on for unblank) if possible. The
+HAL is not expected to do anything else, eg. blank all palette entries. Such
+things are the responsibility of the OS, and also this call is expected to
+be fast. May be called with interrupts off.
+
+-- void HAL_Video_SetPowerSave(uint powersave)
+
+  powersave = 0 or 1 for power save off or on
+              (all other values reserved)
+
+The HAL is expected to perform any reasonable measures on the video
+controller to save power (eg. turn off DACs), when the display is assumed
+not to be required. Blanking is handled by a separate call.
+
+[What does this really mean. What is acceptable and safe for displays? ]
+
+-- void HAL_Video_UpdatePointer(uint flags, int x, int y, const shape_t *shape)
+
+Update the displayed position of the current pointer shape (or turn shape
+off). This call is made by the OS at a time to allow smoothly displayed
+changes (on a VSync).
+
+  flags:
+    bit 0  = pointer display enable (0=off, 1=on)
+    bit 1  = pointer shape update (0=no change, 1=updated)
+    bits 2..31 reserved (0)
+
+  xpos = x position of top left of pointer (xpos = 0 for left of display)
+
+  ypos = y position of top left of pointer (ypos = 0 for top of display)
+
+  shape points to shape_t descriptor block:
+    typedef struct shape_t
+    {
+      uint8   width;      /* unpadded width in bytes (see notes) */
+      uint8   height;     /* in pixels */
+      uint8   padding[2]; /* 2 bytes of padding for field alignment */
+      void   *buffLA;     /* logical address of buffer holding pixel data */
+      void   *buffPA;     /* corresponding physical address of buffer */
+    }
+
+Notes:
+1) if flags bit 0 is 0 (pointer off), x, y, shape are undefined
+2) the shape data from RISC OS is always padded with transparent pixels
+   on the rhs, to a width of 32 pixels (8 bytes)
+3) pointer clipping is the responsibility of the HAL (eg. may be able to
+   allow display of pointer in border region on some h/w)
+4) buffer for pixel data is aligned to a multiple of 256 bytes or better
+
+The HAL may need to take note of the shape updated flag, and make its own
+new copies if true. This is to handle cases like dual scan LCD pointer,
+which typically needs two or more shape buffers for the hardware, or
+possibly to handle clipping properly. This work should only be done when the
+updated flag is true.
+
+A simple HAL, where hardware permits, can use the shape data in the buffer
+directly, ignoring the updated flag. The OS guarantees that the buffer data
+is valid for the whole time it is to be displayed.
+
+-- void HAL_Video_SetDAG(uint DAG, uint paddr)
+
+Set the video DMA address generator value to the given physical address.
+
+  DAG   = 0 set start address of current video display
+          1 set start address of total video buffer
+          2 set end address (exclusive) of total video buffer
+          all other values reserved
+
+  paddr = physical address for given DAG
+
+The OS has a video buffer which is >= total display size, and may be using
+bank switching (several display buffers) or hardware scroll within the total
+video buffer.
+
+  DAG=1 will be start address of current total video buffer
+  DAG=2 will be end address (exclusive) of current total video buffer
+  DAG=0 will be start address in buffer for current display
+
+HALs should respond differently depending on whether hardware scroll is
+supported or not. (The OS will already know this from HAL_Video_Features).
+
+No hardware scroll:
+Only DAG=0 is significant, and the end address of the current display is
+implied by the size of the current mode. Calls with DAG=1,2 should be
+ignored.
+
+Hardware scroll:
+DAG=0 again defines display start. DAG=2 defines the last address
+(exclusive) that should be displayed before wrapping back (if reached within
+display size), and DAG=1 defines the address to which accesses should wrap
+back.
+
+-- int HAL_Video_VetMode(const void *VIDClist, const void *workspace)
+
+Allows HAL to vet a proposed mode.
+
+[What does this really do, and what can HAL do. Are we going to allow
+changes to VIDCList by HAL, ie. not const. Is mode workspace really ok to
+pass to HAL ???]
+
+  VIDClist  -> generic video controller list (VIDC list type 3)
+
+  workspace -> mode workspace (if mode number), or 0
+
+  returns 0 if OK (may be minor adjusts to VIDClist and/or workspace values)
+          non-zero if not OK
+
+
+-- uint HAL_Video_Features(void)
+
+Determine key features supported by the video hardware.
+
+  returns a flags word:
+     bit 0     hardware scroll is supported
+     bit 1     hardware pointer/cursor is supported
+     bit 2     interlace is supported with progressive framestore
+     other bits reserved (returned as 0)
+
+Bits are set for true. If bit 2 is true, then the OS assumes that a simple
+progressive framestore layout is sufficient for an interlaced display (ie.
+that the hardware implements the interlaced scan).
+
+-- uint HAL_Video_PixelFormats(void)
+
+Determine the pixel formats that are supported by the hardware.
+
+  returns flags word:
+     bit 0     1 bpp is supported
+     bit 1     2 bpp is supported
+     bit 2     4 bpp is supported
+     bit 3     8 bpp is supported
+     bit 4    16 bpp is supported
+     bit 5    32 bpp is supported
+     other bits reserved (returned as 0)
+
+Bits are set for true. Bits 0-5 refer to support with standard RISC OS pixel
+layout. (such as little endian packing for 1,2,4 bpp, 5-5-5 RGB for 16 bpp,
+etc). See the section discussing Video for more information. Other formats
+may be introduced when/if RO supports them.
+
+-- uint HAL_Video_BufferAlignment(void)
+
+Determine the framestore buffer alignment required by the hardware.
+
+  returns an unsigned integer:
+    the required alignment for the framestore buffer, in bytes
+    (expected to be a power of 2)
+
+
+-- HAL_MatrixColumns
+
+???
+
+-- HAL_MatrixScan
+
+???
+
+-- HAL_TouchscreenType
+
+???
+
+-- HAL_TouchscreenRead
+
+???
+
+-- unsigned int64 HAL_MachineID(void)
+
+Returns a 64-bit unique machine identifier. What does it mean? ...
+
+-- void *HAL_ControllerAddress(unsigned flags, unsigned controller)
+
+Maps to RISC OS' OS_Memory 9 call - provides a way for people who must poke
+the hardware to find it. Bits 0-7 of controller are the sequence number
+(starting at zero), and bits 8-31 are the controller type. Currently
+allocated types are:
+
+      0 = EASI card access speed control register (sequence no = card)
+      1 = EASI space (sequence no = card)
+      2 = VIDC1
+      3 = VIDC20
+      4 = IOMD
+
+        HALEntry HAL_HardwareInfo
+        HALEntry HAL_SuperIOInfo
+
+
+RISC OS entry points from HAL init
+----------------------------------
+
+These are entry points into the OS, called from the HAL.
+
+-- void RISCOS_InitARM(unsigned int flags)
+
+    flags: reserved - sbz
+
+On entry:
+  SVC mode
+  MMU and caches off
+  IRQs and FIQs disabled
+  No RAM or stack used
+
+On exit:
+  Instruction cache may be on
+
+This routine must be called once very early on in the HAL start-up, to
+accelerate the CPU for the rest of HAL initialisation. Typically, it will
+just enable the instruction cache (if possible on the ARM in use), and
+ensure that the processor is in 32-bit configuration and mode.
+
+Some architecture 4 (and later) ARMs have bits in the control register that
+affect the hardware layer - eg the iA and nF bits in the ARM920T. These are
+the HAL's responsibility - the OS will not touch them. Conversely, the HAL
+should not touch the cache, MMU and core configuration bits (currently bits
+0-14).
+
+On architecture 3, the control register is write only - the OS will set bits
+11-31 to zero.
+
+Likewise, such things as the StrongARM 110's register 15 (Test, Clock and
+Idle Control) are the HAL's responsibility. The OS does not know about the
+configuration of the system, so cannot program such registers.
+
+This entry must not be called after RISCOS_Start.
+
+-- void *RISCOS_AddRAM(unsigned int flags, void *start, void *end, 
+                       uintptr_t sigbits, void *ref)
+   flags
+        bit 0: video memory (only first contiguous range will be used)
+        bits 8-11: speed indicator (arbitrary, higher => faster)
+        other bits reserved (SBZ)
+   start
+        start address of RAM (inclusive) (no alignment requirements)
+   end
+        end address of RAM (exclusive) (no alignment requirements, but must be >= start)
+   sigbits
+        significant address bit mask (1 => this bit of addr decoded, 0 => this bit ignored)
+   ref
+        reference handle (NULL for first call)
+
+Returns ref for next call
+
+On entry:
+  SVC32 mode
+  MMU and data cache off
+  IRQs and FIQs disabled
+
+This entry point must be the first call from the HAL to RISC OS following a hardware
+reset. It may be called as many times as necessary to give all enumerate RAM that
+is available for general purpose use. It should only be called to declare video
+memory if the video memory may be used as normal RAM when in small video modes.
+
+To permit software resets:
+    The HAL must be non-destructive of any declared RAM outside the first 4K of the first
+    block.
+    The stack pointer should be initialised 4K into the first block, or in some non-
+    declared RAM.
+    Must present memory in a fixed order on any given system.
+
+The first block must be at least 256K and 16K aligned.
+Block coalescing only works well if RAM banks are added in ascending address order.
+
+RISC OS will use RAM at the start of the first block as initial workspace.
+Max usage is 16 bytes per block + 32 (currently 8 per block + 4). This
+limits the number of discontiguous blocks (although RISC OS will concatanate
+contiguous blocks where possible).
+
+This call must not be made after RISCOS_Start.
+
+
+-- void RISCOS_Start(unsigned int flags, int *riscos_header,
+                     int *hal_entry_table, void *ref)
+
+   flags
+        bit 0: power on reset
+        bit 1: CMOS reset inhibited (eg protection link on Risc PC)
+        bit 2: perform a CMOS reset (if bit 1 clear and bit 0 set - eg front panel
+                                     button held down on an NC)
+
+On entry:
+  SVC32 mode
+  MMU and data cache off
+  IRQs and FIQs disabled
+
+This routine must be called after all calls to RISCOS_AddRAM have been
+completed. It does not return. Future calls back to the HAL are via the HAL
+entry table, after the MMU has been enabled.
+
+
+-- void *RISCOS_MapInIO(unsigned int flags, void *phys, unsigned int size)
+
+   flags: bit 2 => make memory bufferable
+    phys: physical address to map in
+    size: number of bytes of memory to map in
+
+This routine is used to map in IO memory for the HAL's usage. Normally it
+would only be called during HAL_Init(). Once mapped in the IO space cannot
+be released.
+
+It returns the resultant virtual address corresponding to phys, or 0 for
+failure. Failure can only occur if no RAM is available for page tables, or
+if the virtual address space is exhausted.
+
+-- void *RISCOS_AccessPhysicalAddress(unsigned int flags, void *phys, void **oldp)
+
+   flags: bit 2 => make memory bufferable
+          other bits must be zero
+    phys: physical address to access
+    oldp: pointer to location to store old state (or NULL)
+
+On entry:
+  Privileged mode
+  MMU on
+  FIQs on
+  Re-entrant
+
+On exit:
+  Returns logical address corresponding to phys
+
+Arranges for the physical address phys to be mapped in to logical memory. In
+fact, the whole megabyte containing "phys" is mapped in (ie if phys =
+&12345678, then &12300000 to &123FFFFF become available). The memory is
+supervisor access only, non-cacheable, non-bufferable by default, and will
+remain available until the next call to RISCOS_Release/AccessPhysicalAddress
+(although interrupt routines or subroutines may temporarily map in something
+else).
+
+When finished, the user should call RISCOS_ReleasePhysicalAddress.
+
+-- void RISCOS_ReleasePhysicalAddress(void *old)
+
+  old: state returned from a previous call to RISCOS_AccessPhysicalAddress
+
+On entry:
+  MMU on
+  FIQs on
+  Re-entrant
+
+Usage:
+  Call with the a value output from a previous RISCOS_ReleasePhysicalAddress.
+
+Example:
+
+  void *old;
+  unsigned int *addr = (unsigned int *) 0x80005000;
+  unsigned int *addr2 = (unsigned int *) 0x90005000;
+
+  addr = (unsigned int *) RISCOS_AccessPhysicalAddress(addr, &old);
+  addr[0] = 3; addr[1] = 5;
+
+  addr2 = (unsigned int *) RISCOS_AccessPhysicalAddress(addr2, NULL);
+  *addr2 = 7;
+
+  RISCOS_ReleasePhysicalAddress(old);
diff --git a/VersionASM b/VersionASM
index 1e37868398fd9379e4150d4584b0c3b6a0391423..095b5cdf02a123d842cf3364706b7c3098f4dc6a 100644
--- a/VersionASM
+++ b/VersionASM
@@ -13,12 +13,12 @@
                         GBLS    Module_ComponentPath
 Module_MajorVersion     SETS    "5.35"
 Module_Version          SETA    535
-Module_MinorVersion     SETS    "4.79.2.14"
-Module_Date             SETS    "09 Jan 2001"
-Module_ApplicationDate2 SETS    "09-Jan-01"
-Module_ApplicationDate4 SETS    "09-Jan-2001"
+Module_MinorVersion     SETS    "4.79.2.15"
+Module_Date             SETS    "12 Jan 2001"
+Module_ApplicationDate2 SETS    "12-Jan-01"
+Module_ApplicationDate4 SETS    "12-Jan-2001"
 Module_ComponentName    SETS    "Kernel"
 Module_ComponentPath    SETS    "RiscOS/Sources/Kernel"
-Module_FullVersion      SETS    "5.35 (4.79.2.14)"
-Module_HelpVersion      SETS    "5.35 (09 Jan 2001) 4.79.2.14"
+Module_FullVersion      SETS    "5.35 (4.79.2.15)"
+Module_HelpVersion      SETS    "5.35 (12 Jan 2001) 4.79.2.15"
                         END
diff --git a/VersionNum b/VersionNum
index ffc1a7eadb0301611a8380f6c5f039ddd1a29ab3..686bfb8b6ff1db59f969b92524231593da1dc8d9 100644
--- a/VersionNum
+++ b/VersionNum
@@ -4,19 +4,19 @@
  *
  */
 #define Module_MajorVersion_CMHG        5.35
-#define Module_MinorVersion_CMHG        4.79.2.14
-#define Module_Date_CMHG                09 Jan 2001
+#define Module_MinorVersion_CMHG        4.79.2.15
+#define Module_Date_CMHG                12 Jan 2001
 
 #define Module_MajorVersion             "5.35"
 #define Module_Version                  535
-#define Module_MinorVersion             "4.79.2.14"
-#define Module_Date                     "09 Jan 2001"
+#define Module_MinorVersion             "4.79.2.15"
+#define Module_Date                     "12 Jan 2001"
 
-#define Module_ApplicationDate2         "09-Jan-01"
-#define Module_ApplicationDate4         "09-Jan-2001"
+#define Module_ApplicationDate2         "12-Jan-01"
+#define Module_ApplicationDate4         "12-Jan-2001"
 
 #define Module_ComponentName            "Kernel"
 #define Module_ComponentPath            "RiscOS/Sources/Kernel"
 
-#define Module_FullVersion              "5.35 (4.79.2.14)"
-#define Module_HelpVersion              "5.35 (09 Jan 2001) (4.79.2.14)"
+#define Module_FullVersion              "5.35 (4.79.2.15)"
+#define Module_HelpVersion              "5.35 (12 Jan 2001) (4.79.2.15)"
diff --git a/s/ARM600 b/s/ARM600
index fc0c0ddd67fb48ed855f6803e7e8338dd8aa4db2..b4e1e1750cc8f6e92ec65489a4fd89e98cd40070 100644
--- a/s/ARM600
+++ b/s/ARM600
@@ -2392,6 +2392,7 @@ MMUControl_Flush
        TST      r0,#&80000000
        BEQ      MMUC_flush_flushT
        ARMop    Cache_CleanInvalidateAll,,,r1
+       LDR      r0, [sp]
 MMUC_flush_flushT
        TST      r0,#&40000000
        BEQ      MMUC_flush_done
diff --git a/s/vdu/vdudriver b/s/vdu/vdudriver
index 4300064a96ad45c3e0655a9742f83a27b8a3b54a..30639cf7f8850dc3ec412fbd42321ccb5468099a 100644
--- a/s/vdu/vdudriver
+++ b/s/vdu/vdudriver
@@ -168,15 +168,12 @@ VduInit ROUT
         STR     r0, [r4, #HWPixelFormats]
         mjsCallHAL HAL_Video_Features
         STR     r0, [r4, #HWVideoFeatures]
-        mjsCallHAL HAL_Video_Features
-        STR     r0, [r4, #HWPixelFormats]
         mjsCallHAL HAL_Video_BufferAlignment
         STR     r0, [r4, #HWBufferAlign]
         Pull    "r4, r9, r12"
 
         ;;; sort this out!
-        ! 0, "mjsHAL not doing anything useful with HAL_Video_PixelFormats"
-        ! 0, "mjsHAL not doing anything useful with HAL_Video_bufferAlign"
+        ! 0, "mjsHAL not doing anything useful with HAL_Video_BufferAlignment"
         ! 0, "mjsHAL not dealing with lack of h/w pointer"
 
         LDR     R0, =RangeC+SpriteReason_SwitchOutputToSprite
@@ -607,6 +604,75 @@ CursorNbitTab
         &       Cursor16bit-CursorNbitTab
         &       Cursor32bit-CursorNbitTab
 
+; table of susbstitute mode numbers to cater for hardware that might
+; not support all of 1,2,4,8 bpp (bits per pixel) modes
+;
+; indexed by mode number (0..49), pairs of byte values:
+;   bpp    = bits per pixel of this mode number
+;   promo  = promoted mode number (0..49), or &FF if none
+;
+; promoted number is:
+;  1) same resolution at next higher bpp (up to 8), if available, or
+;  2) similar resolution at 8 bpp (8 bpp should be available on most h/w)
+;
+ModePromoTable
+;
+;          bpp promo       mode no.
+;
+      DCB    1,    8     ;  0
+      DCB    2,    9     ;  1
+      DCB    4,   10     ;  2
+      DCB    1,   15     ;  3
+      DCB    1,    1     ;  4
+      DCB    2,    2     ;  5
+      DCB    1,   13     ;  6
+      DCB    4,   13     ;  7
+      DCB    2,   12     ;  8
+      DCB    4,   13     ;  9
+      DCB    8,  &FF     ; 10
+      DCB    2,   14     ; 11
+      DCB    4,   15     ; 12
+      DCB    8,  &FF     ; 13
+      DCB    4,   15     ; 14
+      DCB    8,  &FF     ; 15
+      DCB    4,   24     ; 16
+      DCB    4,   24     ; 17
+      DCB    1,   19     ; 18
+      DCB    2,   20     ; 19
+      DCB    4,   21     ; 20
+      DCB    8,  &FF     ; 21
+      DCB    4,   36     ; 22
+      DCB    1,   28     ; 23
+      DCB    8,  &FF     ; 24
+      DCB    1,   26     ; 25
+      DCB    2,   27     ; 26
+      DCB    4,   28     ; 27
+      DCB    8,  &FF     ; 28
+      DCB    1,   30     ; 29
+      DCB    2,   31     ; 30
+      DCB    4,   32     ; 31
+      DCB    8,  &FF     ; 32
+      DCB    1,   34     ; 33
+      DCB    2,   35     ; 34
+      DCB    4,   36     ; 35
+      DCB    8,  &FF     ; 36
+      DCB    1,   38     ; 37
+      DCB    2,   39     ; 38
+      DCB    4,   40     ; 39
+      DCB    8,  &FF     ; 40
+      DCB    1,   42     ; 41
+      DCB    2,   43     ; 42
+      DCB    4,   28     ; 43
+      DCB    1,   45     ; 44
+      DCB    2,   46     ; 45
+      DCB    4,   15     ; 46
+      DCB    8,  &FF     ; 47
+      DCB    4,   49     ; 48
+      DCB    8,  &FF     ; 49
+;
+      ALIGN
+
+
 ; *****************************************************************************
 ;
 ;       SYN - Perform MODE change
@@ -634,6 +700,39 @@ VduBadExit                              ; jumped to if an error in VDU code
 ModeChangeSub ROUT
         Push    lr
 
+        ;If its a common mode number (0..49) consider a possible mode number
+        ;substitution, if hardware does not support given bits per pixel.
+        ;We are vaguely assuming h/w supports at least 8 bpp, otherwise we may
+        ;not be able to find a usable mode number, and later code may not handle
+        ;that well. This is probably ok, 8 bpp is almost universal.
+        ;
+        CMP     r2, #256
+        BHS     mchsub_3
+        AND     r1, r2, #&7F
+        CMP     r1, #50                      ; mode number
+        BHS     mchsub_3
+        Push    "r3, r4"
+        ADR     lr, ModePromoTable           ; table of mode promotions
+        LDR     r4, [WsPtr, #HWPixelFormats] ; bits 0 to 3 set for 1,2,4,8 bpp supported
+mchsub_1
+        MOV     r1, r1, LSL #1
+        LDRB    r3, [lr, r1]                 ; bpp for this mode number (1,2,4,8)
+        TST     r3, r4                       ; supported in h/w?
+        ANDNE   r2, r2, #&80                 ; if yes, take mode number that passed
+        ORRNE   r2, r2, r1, LSR #1
+        BNE     mchsub_2
+        ADD     r1, r1, #1                   ; else look for promotion
+        LDRB    r1, [lr, r1]                 ; new mode number
+        CMP     r1, #&FF                     ; &FF if none
+        BNE     mchsub_1
+        ;alright, dont panic, just try to get a VGA-like mode of any bpp, if not tried already
+        CMP     r1, #28                      ; VGA 8 bpp
+        MOVNE   r1, #25                      ; VGA 1 bpp
+        BNE     mchsub_1
+mchsub_2
+        Pull    "r3, r4"
+;
+mchsub_3
         MOV     R1, #Service_PreModeChange
         IssueService
         TEQ     R1, #0                  ; was service claimed ?
diff --git a/s/vdu/vduswis b/s/vdu/vduswis
index 9362f04552588e3b6ed349f8e9855a051c23a1bb..cfe1f8411e63d2a52933cc8cf4496c6286f38655 100644
--- a/s/vdu/vduswis
+++ b/s/vdu/vduswis
@@ -783,23 +783,25 @@ FindOKMode ROUT
         BNE     %FT05
 
 ; service claimed
-; mjs Kernel/HAL split
-; call HAL vetting routine to possibly adjust parameters (or if desperate, to disallow mode)
-
-;;;mjsHAL - is the mode workspace suitably generic to be passed to HAL?
 
-        ; int HAL_VetMode(void *VIDClist, void *workspace)
-        ;
-        ; VIDClist  -> generic video controller list (VIDC list type 3)
-        ; workspace -> mode workspace (if mode number), or 0
-        ; returns 0 if OK (may be minor adjusts to VIDClist and/or workspace values)
-        ;         non-zero if not OK
-        ;
+; mjs Kernel/HAL split
+; call HAL vetting routine to possibly disallow mode
+;
         Push "r0-r3, r9, r12"
         MOV   r0,r3
         MOV   r1,r4
+        ;we'll do the vet on whether h/w supports the pixel depth ourselves
+        LDR   r2,[r0,#VIDCList3_PixelDepth]
+        MOV   r3,#1
+        MOV   r3,r3,LSL r2                 ; bits per pixel
+        LDR   r2,[WsPtr,#HWPixelFormats]
+        TST   r3,r2
+        MOVEQ r0,#1
+        BEQ   %FT04                        ; not supported
+        ;now any vet the HAL might want to do
         mjsAddressHAL
         mjsCallHAL    HAL_Video_VetMode
+04
         CMP   r0,#0
         Pull "r0-r3,r9,r12"
         BNE   %FT05         ; HAL says "Oi, Kernel, No!"
@@ -921,6 +923,13 @@ FindSubstitute Entry
         ADD     r13, r13, #PushedInfoSize
         CMP     r11, #4
         MOVCS   r11, #0
+        Push    "r2, r3"
+        LDR     r2, [WsPtr, #HWPixelFormats]    ; see if h/w supports this BPP
+        MOV     r3, #1
+        MOV     r3, r3, LSL r11
+        TST     r2, r3
+        MOVEQ   r11, #3                         ; if not, use 8 BPP (assumed best chance for a mode number)
+        Pull    "r2, r3"
         LDRB    r1, [r1, r11]
         CLRV
         EXIT