12345678901234567890123456789012345678901234567890123456789012345678901234567890

2001 - a HAL API
----------------

mjs   12 Jan 2001   Early Draft  (mjs,kjb)


RISC OS Hardware Abstraction
============================

Background
----------

This document is concerned with low level developments of RISC OS in order
to support future ARM based platforms. Loosely, this has been considered as
creating a hardware abstraction layer, or HAL. This term is a useful
shorthand, but with the following caveats. Firstly, the HAL work is only
envisaged to provide a modest level of low level abstraction (at least for
the next OS generation). Secondly, significant non-HAL work, at all levels,
is required to make a useful next generation RISC OS.

Note that most of the hardware dependence of the OS is already confined to
low level code (essentially, the kernel and device drivers). Here we assume
that the OS is only expected to run on an ARM processor, and with somewhat
restricted choices of I/O hardware (eg. friendly video pixel formats).

Up to now (version 4), RISC OS has evolved while closely coupled to an ARM
processor core and to an Acorn proprietary chip set (video, memory, I/O). It
has remained highly hardware specific. For the purposes of further
investment in RISC OS, three key areas of hardware dependence must be
addressed; 32-bit clean operation, support for new ARM cores, support for
various video, memory and I/O configurations. Without all of these, the OS
is essentially useless on forseeable future hardware.


32-bit clean code
-----------------

All RISC OS code must run 32-bit clean on future releases. This is because
all ARM cores from ARM9 onwards (and also some ARM7 variants) have entirely
removed support for RISC OS's native 26-bit modes. Note that 32-bit clean
code is not precluded from working on the older ARM cores (back to ARM 610).

With more care, 32/26-bit agnostic code can be written to work back to ARM
2. This may be of interest to module and application code, but note that the
OS kernel itself is only expected to work back to ARM 610, since an MMU is
required.

Much of the work required is routine and has been done for the OS itself
(though long term weeding of consequent bugs is required). A 32-bit
compatible shared C library has been released in order to encourage
conversion of application code by third parties. This work is not part of
hardware abstraction and is not considered further in this document.

Support for newer ARM cores
---------------------------

ARM core support (including caches and MMU) has historically been coded in a
tailored way for one or two specific variants. Since version 3.7 this has
meant just two variants; ARM 6/7 and StrongARM SA110. A more generic
approach is required for the next generation. This aims both to support
several cores in a more structured way, and to cover minor variants (eg.
cache size) with the same support code. The natural approach is to set up
run-time vectors to a set of ARM support routines.

Note that it is currently assumed that the ARM MMU architecture will not
change radically in future ARM cores. Hence, the kernel memory management
algorithms remain largely unchanged. This is believed to be a reasonable
assumption, since the last major memory management change was with Risc PC
and ARM 610 (when the on-chip MMU was introduced).

ARM core support is confined almost entirely to the kernel, and is therefore
not strictly part of the HAL. The HAL will only be concerned with any
external factors such as clock selection. Only HAL aspects are considered
further in this document.

Hardware abstraction layer
--------------------------

A simple HAL is to be inserted underneath RISC OS. This will provide two
functions. Firstly, it will be responsible for initial system bootstrap,
much like a PC BIOS, and secondly it will provide simple APIs to allow
hardware access.

The HAL APIs are a thin veneer on top of the hardware. They are designed to
act as replacements for all the hardware knowledge and manipulation
performed by the RISC OS Kernel, together with some APIs that will allow
RISC OS driver modules to become more hardware independent. No attempt will
be made (at this stage) to perform such tasks as separating the video
drivers from the Kernel, for example.

One tricky design decision is the amount of abstraction to aim for. Too
little, and the system is not flexible enough; too much and HAL design is
needlessly complicated for simple hardware. The present design tries to err
on the side of too little abstraction. Extra, more abstract APIs can always
be added later. So, initially, for example, the serial device API will just
provide discovery, some capability flags and the base address of the UART
register set. This will be sufficient for the vast majority of devices. If
new hardware comes along later that isn't UART compatible, a new API can be
defined. Simple hardware can continue to just report UART base addresses.

The bulk of device driver implementation remains in RISC OS modules - the
difference is that the HAL will allow many device drivers to avoid direct
access to hardware. For example, PS2Driver can now use HAL calls to send and
receive bytes through the PS/2 ports, and thus is no longer tied to IOMD's
PS/2 hardware. Similarly, interrupt masking and unmasking, as performed by
any device vector claimant, is now a HAL call. Note that HAL calls are
normally performed via a Kernel SWI - alternatively the Kernel can return
the address of specific HAL routines. There is nothing to stop specific
drivers talking to hardware directly, as long as they accept that this will
tie them to specific devices.

This dividing line between the HAL and RISC OS driver modules is crucial. If
the HAL does everything, then we have achieved nothing - we have just as
much hardware dependent code - it's just in a different place. It is
important to place the dividing line as close to the hardware as possible,
to make it easy to design a HAL and to prevent large amounts of code 
duplication between HALs for different platforms.

The Kernel remains responsible for the ARM's MMU and all other aspects of
the CPU core. The HAL requires no knowledge of details of ARM
implementations, and thus any HAL implementation should work on any
processor from the ARM610 onwards.


HAL/OS layout and headers
-------------------------

The OS is linked to run at a particular base address. Pre-HAL OS's were
linked to run at <n>MB, that is on a MB alignment to allow efficient MMU
section mapping. For simplicity, the HAL/OS layout can allow a fixed maximum
size for the HAL, currently set at 64k. Then the OS base address will be
<n>MB+64k. This allows a HAL of up to 64K to be placed at the bottom of a
ROM below the OS, and the HAL/OS combination to still be section-mapped. A
ROM should be portable to hardware variants merely by replacing the 64k HAL
block.

A more flexible system would only sacrifice MMU mapping efficiency. The HAL
and OS could be placed in any desired way, provided that each is contiguous
in physical memory.

The OS starts with a header including a magic word - this aids probing and
location of images. The OS header format is defined as:

Word 0: Magic word ("OSIm" - &6D49534F)
Word 1: Flags (currently should be 0)
Word 2: Image size (bytes)
Word 3: Offset (bytes) from OS base to table of OS routine entry points
Word 4: Number of entries in table

The HAL itself should have whatever header is required to start the system.
For example on ARM7500 16->32 bit switch code is required, and on the
9500 parts a special ROM header and checksum must be present. A HAL
descriptor block, instead of a header, can be placed somewhere in the HAL. A
pointer to this block is passed by the HAL to the OS in the OS_Start call:

Word 0: Flags
            bit 0 => uncachable workspace (32K) required
            bits 1-31 reserved
Word 1: Offset (bytes) from descriptor to start of HAL (will be <= 0)
Word 2: HAL size (bytes)
Word 3: Offset (bytes) from descriptor to table of HAL routine entry points
Word 4: Number of entries in table
Word 5: Size of HAL static workspace required (bytes)

Calling standards
-----------------

RISC OS and the HAL are two separate entities, potentially linked
separately. The OS and the HAL are each defined with a set of callable
routines for the OS/HAL interface. Each HAL entry or each OS entry is given
a unique (arbitrary) number, starting at 0. The offset to each entry is
given in an entry table. Calls can be made manually through this table, or
stubs could be created at run-time to allow high-level language calls.

Every entry (up to the declared maximum) must exist. If not implemented, a
failure response must be returned, or the call ignored, as appropriate. Note
that the OS interface for the HAL should not be confused with standard OS
calls (SWIs) already defined for use in the OS itself.

To permit high-level language use in the future, the procedure call standard
in both directions is the ARM-Thumb Procedure Call Standard (ATPCS) as
defined by ARM, with no use of floating point, no stack limit checking, no
frame pointers, and no Thumb interworking. HAL code is expected to be ROPI
and RWPI (ie. all its read-only segments and read-write segments are
position-independent). Hence the HAL is called with its static workspace
base (sb) in r9. The OS kernel is neither ROPI nor RWPI (except for the
pre-MMU calls, which are ROPI). OS calls from the HAL do not use r9 as a
static base.

The HAL will always be called in a privileged mode - if called in an
interrupt mode, the corresponding interrupts will be disabled. The HAL
should not change mode. HAL code should work in both 26-bit and 32-bit modes
(but should assume 32-bit configuration).

Routines can be conveniently specified in C language syntax. Typically they
will be written in assembler. In detail, the ATPCS register usage for HAL
calls is as follows:

  ATPCS  ARM    use                        at exit
  a1     r0     argument 1/return value    undefined or return value
  a2     r1     argument 2/return value    undefined or return value
  a3     r2     argument 3/return value    undefined or return value
  a4     r3     argument 4/return value    undefined or return value
  v1     r4     var 1                      preserved
  v2     r5     var 2                      preserved
  v3     r6     var 3                      preserved
  v4     r7     var 4                      preserved
  v5     r8     var 5                      preserved
  sb     r9     static workspace base      preserved
  v7     r10    var 7                      preserved
  v8     r11    var 8                      preserved
  ip     r12    scratch                    undefined
  sp     r13    stack pointer              preserved
  lr     r14    return link                undefined

The static workspace base points to the HAL workspace.

Note that HAL calls must be assumed to corrupt all of r0-r3,r12,r14. A 
function return value may be in r0, or (less commonly) multiple return
words in two or more of r0-r3.

If there are more than 4 arguments to a HAL call, arguments 5 onwards must
be pushed onto the stack  before the call, and discarded after return. (The
order of arguments is with argument 5 at top of stack, ie. first to be
pulled.)

The register usage for the OS entry points is the same, except that r9 is
not used as a static base (it is preserved).

When using assembler, the register usage may seem somewhat restricted, and
cumbersome for more than 4 arguments. However, it is typically a reasonable
balance for function calls (as a PCS would aim to be), and does not preclude
implementation in C for example. Old kernel code may require register
preserving overhead to insert HAL calls easily, but for most calls this is
insignificant, compared to hardware access costs.

Initialisation sequence
-----------------------

After system reset, bootstrap code in the HAL will do minimal hardware
set-up ... blah blah

HAL entry points
----------------

These routines are expected to be called from the OS (Kernel). See the
'Calling standards' section for general information on register usage and so
forth.

Interrupts
----------

The HAL must provide the ability to identify, prioritise and mask IRQs, and the ability
to mask FIQs. RISC OS supplies the ARM's processor vectors, and on an IRQ calls the HAL
to request the identity of the highest priority interrupt.

IRQ and FIQ device numbers are arbitrary, varying from system to system. They should be
arranged to allow quick mappings to and from hardware registers, and should ideally
be packed, starting at 0.

Timers
------

The HAL must supply at least one timer capable of generating periodic
interrupts. Each timer should generate a separate logical interrupt, and the
interrupt must be latched. The timers must either be variable rate (period is
a multiple of a basic granularity), or be fixed rate (period = 1*granularity).
Optionally, the timer should be capable of reporting the time until the
next interrupt, in units of the granularity.

Counter
-------

The HAL must supply a counter that varies rapidly, appropriate for use for
sub-millisecond timing. On many systems, this counter will form part of
timer 0 - as such it is not required to operate when timer 0 is not running.
On other systems, the periodic timers may have no readable latch, and a
separate unit will be required.

The counter should count down from (period-1) to 0 continuously.

Non-volatile memory
-------------------

The HAL should provide at least 240 bytes of non-volatile memory. If no
non-volatile memory is available, the HAL may provide fake NVRAM contents
suitable for RISC OS - however, it is preferable that the HAL just state
that NVRAM is not available, and RISC OS will act as though a CMOS reset has
been performed every reset.

NVRAM is typically implemented as an IIC device, so the calls are permitted
to be slow, and to enable interrupts. The HAL is not expected to cache
contents.

If the HAL has no particular knowledge of NVMemory, then it may just say
that "NVMemory is on IIC", and the OS will probe for CMOS/EEPROM devices on
the IIC bus.

IIC bus
-------

Many hardware designs have an IIC bus. Often, it is used only to support
non-volatile memory, but in other systems TV tuners, TV modulators,
microcontrollers, and arbitrary expansion cards may be fitted.

Low-level and high level APIs are defined. An arbitrary number of buses is
supported, and each can be controlled by either the low or high level API.
The OS should normally only use one fixed API on each bus - mixing APIs is
unpredictable.

The low-level API requires the OS to control the two lines of the bus
directly. The high-level API currently covers version 2.1 of the IIC
protocol, and allows high-level transactions to be performed.

It is expected that a HAL will always provide the low-level API on each bus,
where possible in hardware. Using this, the OS can provide Fast mode single
or multi-master operation. The HAL may wish to provide the high-level API
where a dedicated IIC port with hardware assistance is available; this will
further permit High-speed and slave operation.

As it is possible that some HAL APIs (eg NVMemory), although abstracted at
this API layer, are still actually an IIC device, a matching set of
high-level IIC calls are provided in the OS. These give the HAL access to
the OS IIC engine, which will make low-level HAL calls. This saves the HAL
from implementing the full IIC protocol. To illustrate this diagramatically:

    +----------+ NVMem_Read +------------+  NVMemoryRead  +------------+
    |          | ---------> |            | ------------>  |            |
    |   App    |            |     OS     |  IICTransmit   |    HAL     |
    |          |            |            | <------------  |            |
    |          |            |            |  IICSetLines   |            |
    |          |            |            | ------------>  |            |
    +----------+            +------------+                +------------+

The low-level calls should be fast. Interrupt status may not be altered.

The following structure is used:

   typedef struct { int SDA, SCL } IICLines;

High level API to be defined ...

Video
-----

The HAL only attempts to abstract the hardware controller aspects of the OS
video. It does not (yet) consider pixel formats, framestore layout, hardware
graphics acceleration. All these would affect a great deal of RISC OS
graphics code that forms much of the value of the OS. This means that the
envisaged HAL/RISC OS combination makes some specific assumptions about
graphics framestore layout as follows:

 - memory mapped framestore
 - expected to be contiguous physical memory, can be specific memory (eg. VRAM) 
 - mapped as contiguous logical memory
 - progressive raster scan in logical memory from top left pixel to bottom right
 - start of each raster row must be word aligned
 - number of pixels in a row should be such that row is a whole number of words
 - spacing between start of each row is a constant number of words, possibly
   greater than row length (via mode variable, LineLength)
 - 1,2,4,8,16 or 32 bits per pixel (bpp)
 - little endian pixel packing for 1,2,4 bpp (least significant bits are
   leftmost pixels)
 - presence of palette assumed for 1,2,4,8 bpp (8-bits per r,g,b component in
   each entry)
 - 16 bpp format:
     bits 0-4       Red
          5-9       Green
          10-14     Blue
          15        Supremacy (0=solid, 1=transparent)
 - 32 bpp format:
     bits 0-7       Red
          8-15      Green
          16-23     Blue
          24-31     Supremacy (0=solid, 255=transparent)
 - palette words are 32 bits: 
     bits 0-7       Reserved (0), or Supremacy (0=solid, 255=transparent)
          8-15      Red
          16-23     Green
          24-31     Blue
 - pointer/cursor is assumed supported in hardware, 32x32 pixels,
   each pixel either transparent or one of 3 paletted colours
 - support for physically interlaced, logically progressive framestore via
   MMU tricks and use of LineLength mode variable, currently not fully
   integrated into kernel

Note that it is possible to support hardware where only some pixel depths
are available, or only some fit the RISC OS assumptions. Also some hardware
has some configurability for 'arbitrary' choices like RGB versus BGR
ordering. Hence, the restrictions are typically much less severe than might
first be thought.

Supporting a software only pointer/cursor is feasible (much less work than
new pixel formats) but not yet considered.

Aside: RISC OS video interlace trick
------------------------------------

Has been used in NC/STB variants. Makes a physically interlaced framestore
(two distinct field stores) appear as logically progressive framestore,
using MMU to map many logical copies, and using freedom to choose a constant
logical increment between rows in RO mode definition. For 576 rows say, uses
576M of logical space. Each 1M (section mapped) supports a row and allows
logical address to increment monotonically, as physical address alternates
between (increasing rows of) physical field stores. Currently not integrated
into kernel, so fudges address space allocation and poking of video
variables. Also has drawback of thrashing data TLBs (one entry per row).

The trick requires the physical field stores to be separated by 1M plus half
a row. The logical spacing between rows is also set to 1M plus half a row.
The 1M logical sections are set to map alternately to the even and odd
physical fields (the second field being offset by half a row relative to 1M
alignment). Then the logical incrementing of rows maps alternately between
fields, incrementing physically by 1 row between visits to the same field.
Note that the multiple logical mapping implies uncached screen to avoid
coherency worries, but RO uses uncached screen anyway (with exception of
Ursula/Phoebe, now defunct). 


Routines in detail
------------------

[Note, plonking all routines here possibly only temporarily. May want
routines listed in relevant sections with overview. eg. video routines
with video section, etc.]

-- HAL_Init(unsigned int *riscos_header, void *uncacheable_ws)

The OS will call HAL_Init after enabling the MMU, and initialising the HAL
workspace (filled with 0). At this point any initialisation for the main HAL
routines (rather than the early bootstrap code in the HAL) can be done.

-- HAL_IRQEnable

????

-- HAL_IRQDisable

????

-- HAL_IRQClear

????

-- HAL_IRQSource

????

-- HAL_Reset

This resets the board depending on the value in a1
 a1 = 0  hard reset and turn the power off (ie.just turn the power off)
 a1 = 1  hard reset and leave the power on
 a1 > 1  reserved
Asking HAL_PlatformInfo will tell you if the hardware allows the power to be turned off by software,if it doesn't then behaviour is per a1 = 1

-- int HAL_Timers(void)

Returns number of timers. Timers are numbered from 0 upwards. Timer 0 must
exist.

-- int HAL_TimerDevice(int timer)

Returns device number of timer n. A device number refers to the IRQ device
number for interrupt calls.

-- unsigned int HAL_TimerGranularity(int timer)

Returns basic granularity of timer n in ticks per second.

-- unsigned int HAL_TimerMaxPeriod(int timer)

Returns maximum period of the timer, in units of Granularity. Will be 1 for
a fixed rate timer.

-- void HAL_TimerSetPeriod(int timer, unsigned int period)

Sets period of timer n. If period > 0, the timer will generate interrupts
every (period / granularity) seconds. If period = 0, the timer may be
stopped. This may not be possible on some hardware, so the corresponding
interrupt should be masked in addition to calling this function with period
0. If period > maxperiod, behaviour is undefined.

-- unsigned int HAL_TimerPeriod(int timer)

Reads period of timer n. This should be the actual period in use by the
hardware, so if for example period 0 was requested and impossible, the
actual current period should be reported.

-- unsigned int HAL_TimerReadCountdown(int timer)

Returns the time until the next interrupt in units of granularity, rounded
down. If not available, 0 is returned.

-- unsigned int HAL_CounterRate(void)

Returns the rate of the counter in ticks per second. Typically will equal
HAL_TimerGranularity(0).

-- unsigned int HAL_CounterPeriod(void)

Returns the period of the counter, in ticks. Typically will equal
HAL_TimerPeriod(0).

-- unsigned int HAL_CounterRead(void)

Reads the current counter value. Typically will equal
HAL_TimerReadCountdown(0).

-- unsigned void HAL_CounterDelay(unsigned int microseconds)

Delay for at least the specified number of microseconds.

-- unsigned int HAL_NVMemoryType(void)

Returns a flags word describing the NVMemory
      bits 0-7: 0 => no NVMemory available
                1 => NVMemory may be available on the IIC bus
                2 => NVMemory is available on the IIC bus, and the
                     device characteristics are known
                3 => the HAL provides NVMemory access calls.
      bit 8:    NVMemory has a protected region at the end
      bit 9:    Protected region is software deprotectable
      bit 10:   Memory locations 0-15 are readable
      bit 11:   Memory locations 0-15 are writeable

If bits 0-7 are 0 or 1 no other NVMemory calls need be available, and bits
8-31 should be zero.

If bits 0-7 are 2, Size, ProtectedSize, Protection and IICAddress calls must
be available.

If bits 0-7 are 3, all calls except IICAddress must be available.

-- unsigned int HAL_NVMemorySize(void)

Returns the number of bytes of non-volatile memory available. Bytes 0-15
should be included in the count, so for example a Philips PCF8583 CMOS/RTC
device (as used in the Archimedes and Risc PC) would be described as a
256-byte device, with locations 0-15 not readable. More complex arrangements
would have to be abstracted out by the HAL providing its own NVMemory access
calls.

This is to suit the current RISC OS Kernel, which does not use bytes 0-15.

-- unsigned int HAL_NVMemoryProtectedSize(void)

Returns the number of bytes of NVMemory that are protected. These should be
at the top of the address space. The OS will not attempt to write to those
locations without first requesting deprotection (if available). Returns 0 if
bit 8 of the flags is clear.

-- void HAL_NVMemoryProtection(bool)

Enables (if true) or disables if (false) the protection of the software
protectable region. Does nothing if bits 8 and 9 not both set.

-- unsigned int HAL_NVMemoryIICAddress(void)

Returns a word describing the addressing scheme of the NVRAM.
      bits 0-7:  IIC address
       
This will always be on bus zero.

-- int HAL_NVMemoryRead(unsigned int addr, void *buffer, unsigned int n)

Reads n bytes of memory from address addr onwards into the buffer supplied.
Returns the number of bytes successfully read. Under all normal
circumstances the return value will be n - if it is not, a hardware failure
is implied. Behaviour is undefined if the address range specified is outside
the NVMemory, or inside bytes 0-15, if declared unavailable.

-- int HAL_NVMemoryWrite(unsigned int addr, void *buffer, unsigned int n)

Write n bytes of memory into address addr onwards from the buffer supplied.
Returns the number of bytes successfully written. Under all normal
circumstances the return value will be n - if it is not, a hardware failure
is implied. Behaviour is undefined if the address range specified is outside
the NVMemory. Writes inside the a protected region should be ignored.

-- int HAL_IICBuses(void)

Returns the number of IIC buses on the system.

-- unsigned int HAL_IICType(int bus)

Returns a flag word describing the specified IIC bus.
        bit 0: Bus supplies the low-level API
        bit 1: Bus supplies the high-level API
        bit 2: High-level API supports multi-master operation
        bit 3: High-level API supports slave operation
       bit 16: Bus supports Fast (400kbps) operation
       bit 17: Bus supports High-speed (3.4Mbps) operation
   bits 20-31: Version number of IIC supported by high-level API, * 100.


-- __value_in_regs IICLines HAL_IICSetLines(int bus, IICLines lines)

Sets the SDA and SCL lines on the specified bus. A 0 value represents logic
LOW, 1 logic HIGH. The function then reads back and returns the values
present on the bus, to permit arbitration.

Note the "__value_in_regs" keyword, which signifies that the binary ABI
expects SDA and SCL to be returned in registers a1 and a2.

-- __value_in_regs IICLines HAL_IICReadLines(int bus)

Reads the state of the IIC lines on the specified bus, without changing
their state.

Note the "__value_in_regs" keyword, which signifies that the binary ABI
expects SDA and SCL to be returned in registers a1 and a2.

-- int HAL_VideoFlybackDevice(void)

Returns the device number of the video flyback interrupt. [Note: HAL
interrupt API possibly subject to change, may affect this call.]

-- void HAL_Video_SetMode(const void *VIDCList3)

Programs the video controller to initialise a display mode. RISC OS passes a
standard VIDC List Type 3 as specified in PRM 5a-125. Note that this is a
generic video controller list, and so VIDC in this context does not refer to
any specific devices such as Acorn VIDC20.

The HAL is expected to set the video controller timings on this call. Any
palette, pixel DMA and hardware cursor settings are controlled via other
calls.

-- void HAL_Video_WritePaletteEntry(uint type, uint pcolour, uint index)

Writes a single palette entry to the video controller.

  type     = 0 for normal palette entry
             1 for border colour
             2 for pointer colour
          >= 3 reserved

  pcolour  = palette entry colour in BBGGRRSS format (Blue,Green,Red,Supremacy)

  index    = index of entry

Indices are in the range 0..255 for normal, 0 for border, 0..3 for pointer
colours. Note that RISC OS only makes calls using 1..3 for the pointer, and
pointer colour 0 is assumed to be transparent.

-- void HAL_Video_WritePaletteEntries(uint type, const uint *pcolours, 
                                      uint index, uint Nentries)

Writes a block of palette entries to the video controller.

  type     = 0 for normal palette entry
             1 for border colour
             2 for pointer colour
          >= 3 reserved

  pcolours = pointer to block of palette entry colours in BBGGRRSS format
             (Blue,Green,Red,Supremacy)

  index    = start index in palette (for first entry in block)

  Nentries = number of entries in block (must be >= 1)

Indices are in the range 0..255 for normal, 0 for border, 0..3 for pointer
colours. Note that RISC OS only makes calls using 1..3 for the pointer, and
pointer colour 0 is assumed to be transparent.

-- uint HAL_Video_ReadPaletteEntry(uint type, uint pcolour, uint index)

Returns the effective palette entry after taking into account any hardware
restrictions in the video controller, assuming it was originally programmed
with the value pcolour.

  type     = 0 for normal palette entry
             1 for border colour
             2 for pointer colour
          >= 3 reserved

  pcolour  = palette entry colour in BBGGRRSS format (Blue,Green,Red,Supremacy)

  index    = index of entry

  returns  : effective BBGGRRSS

Indices are in the range 0..255 for normal, 0 for border, 0..3 for pointer
colours. Note that RISC OS only makes calls using 1..3 for the pointer, and
pointer colour 0 is assumed to be transparent.

Depending on harwdware capabilities, HALs may have to remember current
settings (eg. bits per pixel) or keep soft copies of entries. Because this
call supplies the original pcolour, this need is minimised (some HALs can
just return pcolour or a directly modified pcolour).

-- void HAL_Video_SetInterlace(uint interlace)

Sets the video interlaced sync.

  interlace = 0 or 1 for interlace off or on
              (all other values reserved)

-- void HAL_Video_SetBlank(uint blank, uint DPMS)

  blank = 0 or 1 for unblank or blank
          (all other values reserved)

  DMPS  = 0..3 as specified by monitor DPMSState (from mode file)
          0 for no DPMS power saving

The HAL is expected to attempt to turn syncs off according to DPMS, and to
turn video DMA off for blank (and therefore on for unblank) if possible. The
HAL is not expected to do anything else, eg. blank all palette entries. Such
things are the responsibility of the OS, and also this call is expected to
be fast. May be called with interrupts off.

-- void HAL_Video_SetPowerSave(uint powersave)

  powersave = 0 or 1 for power save off or on
              (all other values reserved)

The HAL is expected to perform any reasonable measures on the video
controller to save power (eg. turn off DACs), when the display is assumed
not to be required. Blanking is handled by a separate call.

[What does this really mean. What is acceptable and safe for displays? ]

-- void HAL_Video_UpdatePointer(uint flags, int x, int y, const shape_t *shape)

Update the displayed position of the current pointer shape (or turn shape
off). This call is made by the OS at a time to allow smoothly displayed
changes (on a VSync).

  flags:
    bit 0  = pointer display enable (0=off, 1=on)
    bit 1  = pointer shape update (0=no change, 1=updated)
    bits 2..31 reserved (0)

  xpos = x position of top left of pointer (xpos = 0 for left of display)

  ypos = y position of top left of pointer (ypos = 0 for top of display)

  shape points to shape_t descriptor block:
    typedef struct shape_t
    {
      uint8   width;      /* unpadded width in bytes (see notes) */
      uint8   height;     /* in pixels */
      uint8   padding[2]; /* 2 bytes of padding for field alignment */
      void   *buffLA;     /* logical address of buffer holding pixel data */
      void   *buffPA;     /* corresponding physical address of buffer */
    }

Notes:
1) if flags bit 0 is 0 (pointer off), x, y, shape are undefined
2) the shape data from RISC OS is always padded with transparent pixels
   on the rhs, to a width of 32 pixels (8 bytes)
3) pointer clipping is the responsibility of the HAL (eg. may be able to
   allow display of pointer in border region on some h/w)
4) buffer for pixel data is aligned to a multiple of 256 bytes or better

The HAL may need to take note of the shape updated flag, and make its own
new copies if true. This is to handle cases like dual scan LCD pointer,
which typically needs two or more shape buffers for the hardware, or
possibly to handle clipping properly. This work should only be done when the
updated flag is true.

A simple HAL, where hardware permits, can use the shape data in the buffer
directly, ignoring the updated flag. The OS guarantees that the buffer data
is valid for the whole time it is to be displayed.

-- void HAL_Video_SetDAG(uint DAG, uint paddr)

Set the video DMA address generator value to the given physical address.

  DAG   = 0 set start address of current video display
          1 set start address of total video buffer
          2 set end address (exclusive) of total video buffer
          all other values reserved

  paddr = physical address for given DAG

The OS has a video buffer which is >= total display size, and may be using
bank switching (several display buffers) or hardware scroll within the total
video buffer.

  DAG=1 will be start address of current total video buffer
  DAG=2 will be end address (exclusive) of current total video buffer
  DAG=0 will be start address in buffer for current display

HALs should respond differently depending on whether hardware scroll is
supported or not. (The OS will already know this from HAL_Video_Features).

No hardware scroll:
Only DAG=0 is significant, and the end address of the current display is
implied by the size of the current mode. Calls with DAG=1,2 should be
ignored.

Hardware scroll:
DAG=0 again defines display start. DAG=2 defines the last address
(exclusive) that should be displayed before wrapping back (if reached within
display size), and DAG=1 defines the address to which accesses should wrap
back.

-- int HAL_Video_VetMode(const void *VIDClist, const void *workspace)

Allows HAL to vet a proposed mode.

[What does this really do, and what can HAL do. Are we going to allow
changes to VIDCList by HAL, ie. not const. Is mode workspace really ok to
pass to HAL ???]

  VIDClist  -> generic video controller list (VIDC list type 3)

  workspace -> mode workspace (if mode number), or 0

  returns 0 if OK (may be minor adjusts to VIDClist and/or workspace values)
          non-zero if not OK


-- uint HAL_Video_Features(void)

Determine key features supported by the video hardware.

  returns a flags word:
     bit 0     hardware scroll is supported
     bit 1     hardware pointer/cursor is supported
     bit 2     interlace is supported with progressive framestore
     other bits reserved (returned as 0)

Bits are set for true. If bit 2 is true, then the OS assumes that a simple
progressive framestore layout is sufficient for an interlaced display (ie.
that the hardware implements the interlaced scan).

-- uint HAL_Video_PixelFormats(void)

Determine the pixel formats that are supported by the hardware.

  returns flags word:
     bit 0     1 bpp is supported
     bit 1     2 bpp is supported
     bit 2     4 bpp is supported
     bit 3     8 bpp is supported
     bit 4    16 bpp is supported
     bit 5    32 bpp is supported
     other bits reserved (returned as 0)

Bits are set for true. Bits 0-5 refer to support with standard RISC OS pixel
layout. (such as little endian packing for 1,2,4 bpp, 5-5-5 RGB for 16 bpp,
etc). See the section discussing Video for more information. Other formats
may be introduced when/if RO supports them.

-- uint HAL_Video_BufferAlignment(void)

Determine the framestore buffer alignment required by the hardware.

  returns an unsigned integer:
    the required alignment for the framestore buffer, in bytes
    (expected to be a power of 2)


-- HAL_MatrixColumns

???

-- HAL_MatrixScan

???

-- HAL_TouchscreenType

???

-- HAL_TouchscreenRead

???

-- unsigned int64 HAL_MachineID(void)

Returns a 64-bit unique machine identifier,this may later be used to
form the ethernet MAC address but otherwise has no great significance on non
networked machines.

The top 8 bits are a CRC,based on the same algorithm the original DS2401 
used - if the CRC fails zero will be substituted

-- void *HAL_ControllerAddress(unsigned controller)

Asks the HAL where various controllers might or might not be.
Podule manager uses this information to determine at run time whether or not
to bother doing anything.

Returns r0=logical address of the chosen controller,or zero

   0 = EASI card access speed control
   1 = EASI space(s)
   2 = VIDC1
   3 = VIDC20
   4 = S space base (IOMD,podules,NICs,blah blah)
   5 = Extension ROM(s)

-- HALEntry HAL_HardwareInfo

See OS_ReadSysInfo reason code 2

--  HALEntry HAL_SuperIOInfo

See OS_ReadSysInfo reason code 3

--  void HAL_PlatformInfo(unsigned int unused, unsigned int *flags, unsigned int *defined_flags)

See OS_ReadSysInfo reason code 8

RISC OS entry points from HAL init
----------------------------------

These are entry points into the OS, called from the HAL.

-- void RISCOS_InitARM(unsigned int flags)

    flags: reserved - sbz

On entry:
  SVC mode
  MMU and caches off
  IRQs and FIQs disabled
  No RAM or stack used

On exit:
  Instruction cache may be on

This routine must be called once very early on in the HAL start-up, to
accelerate the CPU for the rest of HAL initialisation. Typically, it will
just enable the instruction cache (if possible on the ARM in use), and
ensure that the processor is in 32-bit configuration and mode.

Some architecture 4 (and later) ARMs have bits in the control register that
affect the hardware layer - eg the iA and nF bits in the ARM920T. These are
the HAL's responsibility - the OS will not touch them. Conversely, the HAL
should not touch the cache, MMU and core configuration bits (currently bits
0-14).

On architecture 3, the control register is write only - the OS will set bits
11-31 to zero.

Likewise, such things as the StrongARM 110's register 15 (Test, Clock and
Idle Control) are the HAL's responsibility. The OS does not know about the
configuration of the system, so cannot program such registers.

This entry must not be called after RISCOS_Start.

-- void *RISCOS_AddRAM(unsigned int flags, void *start, void *end, 
                       uintptr_t sigbits, void *ref)
   flags
        bit 0: video memory (only first contiguous range will be used)
        bit 1: video memory is not suitable for general use
        bits 8-11: speed indicator (arbitrary, higher => faster)
        other bits reserved (SBZ)
   start
        start address of RAM (inclusive) (no alignment requirements)
   end
        end address of RAM (exclusive) (no alignment requirements, but must be >= start)
   sigbits
        significant address bit mask (1 => this bit of addr decoded, 0 => this bit ignored)
   ref
        reference handle (NULL for first call)

Returns ref for next call

On entry:
  SVC32 mode
  MMU and data cache off
  IRQs and FIQs disabled

This entry point must be the first call from the HAL to RISC OS following a hardware
reset. It may be called as many times as necessary to give all enumerate RAM that
is available for general purpose use. It should only be called to declare video
memory if the video memory may be used as normal RAM when in small video modes.

To permit software resets:
    The HAL must be non-destructive of any declared RAM outside the first 4K of the first
    block.
    The stack pointer should be initialised 4K into the first block, or in some non-
    declared RAM.
    Must present memory in a fixed order on any given system.

The first block must be at least 256K and 16K aligned.
Block coalescing only works well if RAM banks are added in ascending address order.

RISC OS will use RAM at the start of the first block as initial workspace.
Max usage is 16 bytes per block + 32 (currently 8 per block + 4). This
limits the number of discontiguous blocks (although RISC OS will concatanate
contiguous blocks where possible).

This call must not be made after RISCOS_Start.


-- void RISCOS_Start(unsigned int flags, int *riscos_header,
                     int *hal_entry_table, void *ref)

   flags
        bit 0: power on reset
        bit 1: CMOS reset inhibited (eg protection link on Risc PC)
        bit 2: perform a CMOS reset (if bit 1 clear and bit 0 set - eg front panel
                                     button held down on an NC)
        bit 3: there is no CMOS (the Kernel must use a RAM cache)
        bit 4: the RAM has already been cleared to zero

On entry:
  SVC32 mode
  MMU and data cache off
  IRQs and FIQs disabled

This routine must be called after all calls to RISCOS_AddRAM have been
completed. It does not return. Future calls back to the HAL are via the HAL
entry table, after the MMU has been enabled.


-- void *RISCOS_MapInIO(unsigned int flags, void *phys, unsigned int size)

   flags: bit 2 => make memory bufferable
    phys: physical address to map in
    size: number of bytes of memory to map in

This routine is used to map in IO memory for the HAL's usage. Normally it
would only be called during HAL_Init(). Once mapped in the IO space cannot
be released.

It returns the resultant virtual address corresponding to phys, or 0 for
failure. Failure can only occur if no RAM is available for page tables, or
if the virtual address space is exhausted.


-- void RISCOS_AddDevice(unsigned int flags, struct device *d)


-- uint64_t RISCOS_LogToPhys(const void *log)


-- int RISCOS_IICOpV(IICDesc *descs, uint32_t ndesc_and_bus)


-- void *RISCOS_MapInIO64(unsigned int flags, uint64_t phys, unsigned int size)

As for RISCOS_MapInIO, but accepting a 64-bit physical address argument.


-- void *RISCOS_AccessPhysicalAddress(unsigned int flags, uint64_t phys, void **oldp)

   flags: bit 2 => make memory bufferable
          other bits must be zero
    phys: physical address to access
    oldp: pointer to location to store old state (or NULL)

On entry:
  Privileged mode
  MMU on
  FIQs on
  Re-entrant

On exit:
  Returns logical address corresponding to phys

Arranges for the physical address phys to be mapped in to logical memory. In
fact, at least the whole megabyte containing "phys" is mapped in (ie if phys =
&12345678, then &12300000 to &123FFFFF become available). The memory is
supervisor access only, non-cacheable, non-bufferable by default, and will
remain available until the next call to RISCOS_Release/AccessPhysicalAddress
(although interrupt routines or subroutines may temporarily map in something
else).

When finished, the user should call RISCOS_ReleasePhysicalAddress.

-- void RISCOS_ReleasePhysicalAddress(void *old)

  old: state returned from a previous call to RISCOS_AccessPhysicalAddress

On entry:
  MMU on
  FIQs on
  Re-entrant

Usage:
  Call with the a value output from a previous RISCOS_ReleasePhysicalAddress.

Example:

  void *old;
  uint64_t addr_physical = (uint64_t) 0x80005000;
  uint64_t addr2_physical = (uint64_t) 0x90005000;
  uint32_t *addr_logical;
  uint32_t *addr2_logical;

  addr_logical = (uint32_t *) RISCOS_AccessPhysicalAddress(0, addr_physical, &old);
  addr_logical[0] = 3; addr_logical[1] = 5;

  addr2_logical = (uint32_t *) RISCOS_AccessPhysicalAddress(0, addr2_physical, NULL);
  *addr2_logical = 7;

  RISCOS_ReleasePhysicalAddress(old);