Commit 6a293f53 authored by Mike Stephens's avatar Mike Stephens
Browse files

kernel now attempts to substitute video mode numbers in face of h/w with...

kernel now attempts to substitute video mode numbers in face of h/w with limited bits-per-pixel support (not tested yet)

HAL_API document added - early draft only, of interest to those
writing or modifying HALs for new h/w
ARMop_API document added - early draft only, of interest only
to those modifying kernel to support new ARM cores
*** polite comments on HAL_API welcome ***

Version 5.35, 4.79.2.15. Tagged as 'Kernel-5_35-4_79_2_15'
parent cdf980ed
12345678901234567890123456789012345678901234567890123456789012345678901234567890
mjs 12 Jan 2001 Early Draft
RISC OS Kernel ARM core support
===============================
This document is concerned with the design of open ended support for
multiple ARM cores within the RISC OS kernel, as part of the work loosely
termed hardware abstraction. Note that the ARM core support is part of the
OS kernel, and so is not part of the hardware abstraction layer (HAL)
itself.
Background
----------
ARM core support (including caches and MMU) has historically been coded in a
tailored way for one or two specific variants. Since version 3.7 this has
meant just two variants; ARM 6/7 and StrongARM SA110. A more generic
approach is required for the next generation. This aims both to support
several cores in a more structured way, and to cover minor variants (eg.
cache size) with the same support code. The natural approach is to set up
run-time vectors to a set of ARM support routines.
Note that it is currently assumed that the ARM MMU architecture will not
change radically in future ARM cores. Hence, the kernel memory management
algorithms remain largely unchanged. This is believed to be a reasonable
assumption, since the last major memory management change was with Risc PC
and ARM 610 (when the on-chip MMU was introduced).
Note that all ARM support code must be 32-bit clean, as part of the 32-bit
clean kernel.
Survey of ARM core requirements
-------------------------------
At present, five broad ARM core types can be considered to be of interest;
ARM7 (and ARM6), ARM9, ARM10, StrongARM (SA1) and XScale. These divide
primarily in terms of cache types, and cache and TLB maintenance
requirements. They also span a range of defined ARM architecture variants,
which introduced variants for system operations (primarily coprocessor 15
instructions).
The current ARM architecture is version 5. This (and version 4) has some
open ended definitions to allow code to determine cache size and types from
CP15 registers. Hence, the design of the support code can hope to be at
least tolerant of near future variations that are introduced.
ARM7
----
ARM7 cores may be architecture 3 or 4. They differ in required coprocessor
15 operations for the same cache and TLB control. ARM6 cores are much the
same as architecture 3 ARM7. The general character of all these cores is of
unified write-through caches that can only be invalidated on a global basis.
The TLBs are also unified, and can be invalidated per entry or globally.
ARM9
----
ARM9 cores are architecture 4. We ignore ARM9 variants without an MMU. The
kernel can read cache size and features. The ARM 920 or 922 have harvard
caches, with writeback and writethrough capable data caches (on a page or
section granularity). Data and instruction caches can be invalidated by
individual lines or globally. The data cache can be cleaned by virtual
address or cache segment/index, allowing for efficient cache maintenance.
Data and instruction TLBs can be invalidated by entry or globally.
ARM10
-----
ARM 10 is architecture 5. Few details available at present. Likely to be
similar to ARM9 in terms of cache features and available operations.
StrongARM
---------
StrongARM is architecture 4. StrongARMs have harvard caches, the data cache
being writeback only (no writethrough option). The data cache can only be
globally cleaned in an indirect manner, by reading from otherwise unused
address space. This is inefficient because it requires external (to the
core) reads on the bus. In particular, the minimum cost of a clean, for a
nearly clean cache, is high. The data cache supports clean and invalidate by
individual virtual lines, so this is reasonably efficient for small ranges
of address. The data TLB can be invalidated by entry or globally.
The instruction cache can only be invalidated globally. This is inefficient
for cases such as IMBs over a small range (dynamic code). The instruction
TLB can only be invalidated globally.
Some StrongARM variants have a mini data cache. This is selected over the
main cache on a section or page by using the cachable/bufferable bits set to
C=1,B=0 in the MMU (this is not standard ARM architecture). The mini data
cache is writeback and must be cleaned in the same manner as the main data
cache.
XScale
------
XScale is architecture 5. It implements harvard caches, the data cache being
writeback or writethrough (on a page or section granularity). Data and
instruction caches can be invalidated by individual lines or globally. The
data cache can be fully cleaned by allocating lines from otherwise unused
address space. Unlike StrongARM, no external reads are needed for the clean
operation, so that cache maintenance is efficient.
XScale has a mini data cache. This is only available by using extension bits
in the MMU. This extension is not documented in the current manual for
architecture 5, but will presumably be properly recognised by ARM. It should
be a reasonably straightforward extension for RISC OS. The mini data cache
can only be cleaned by inefficient indirect reads as on StrongARM. However,
for XScale, the whole mini data cache can be configured as writethrough to
obviate this problem. The most likely use for RISC OS is to map screen
memory as mini cacheable, when writethrough caching will also be highly
desirable to prevent delayed screen update.
The instruction and data TLBs can each be invalidated by entry or globally.
Kernel ARM operations
---------------------
This section lists the definitions and API of the set of ARM operations
required by the kernel for each major ARM type that is to be supported. Some
operations may be very simple on some ARMs. Others may need support from the
kernel environment - for example, readable parameters that have been
determined at boot, or address space available for cache clean operations.
The general rules for register usage and preservation in calling these
operations is:
- any parameters are passed in r0,r1 etc. as required
- r0 may be used as a scratch register
- the routines see a valid stack via sp, at least 16 words are available
- lr is the return link as required
- on exit, all registers except r0 and lr must be preserved
Note that where register values are given as logical addresses, these are
RISC OS logical addresses. The equivalent ARM terminology is virtual address
(VA), or modified virtual address (MVA) for architectures with the fast
context switch extension.
Note also that where cache invalidation is required, it is implicit that any
associated operations for a particular ARM should be performed also. The
most obvious example is for an ARM with branch prediction, where it may be
necessary to invalidate a branch cache anywhere where instruction cache
invalidation is to be performed.
Any operation that is a null operation on the given ARM should be
implemented as a single return instruction:
MOV pc, lr
-- Cache_CleanInvalidateAll
The cache or caches are to be globally invalidated, with cleaning of any
writeback data being properly performed.
entry: -
exit: -
IRQs are enabled
call is not reentrant
Note that any write buffer draining should also be performed by this
operation, so that memory is fully updated with respect to any writeaback
data.
The OS only expects the invalidation to be with respect to instructions/data
that are not involved in any currently active interrupts. In other words, it
is expected and desirable that interrupts remain enabled during any extended
clean operation, in order to avoid impact on interrupt latency.
-- Cache_CleanAll
The unified cache or data cache are to be globally cleaned (any writeback data
updated to memory). Invalidation is not required.
entry: -
exit: -
IRQs are enabled
call is not reentrant
Note that any write buffer draining should also be performed by this
operation, so that memory is fully updated with respect to any writeaback
data.
The OS only expects the cleaning to be with respect to data that are not
involved in any currently active interrupts. In other words, it is expected
and desirable that interrupts remain enabled during any extended clean
operation, in order to avoid impact on interrupt latency.
-- Cache_InvalidateAll
The cache or caches are to be globally invalidated. Cleaning of any writeback
data is not to be performed.
entry: -
exit: -
IRQs are enabled
call is not reentrant
This call is only required for special restart use, since it implies that
any writeback data are either irrelevant or not valid. It should be a very
simple operation on all ARMs.
-- Cache_RangeThreshold
Return a threshold value for an address range, above which it is advisable
to globally clean and/or invalidate caches, for performance reasons. For a
range less than or equal to the threshold, a ranged cache operation is
recommended.
entry: -
exit: r0 = threshold value (bytes)
IRQs are enabled
call is not reentrant
This call returns a value that the kernel may use to select between strategies
in some cache operations. This threshold may also be of use to some of the
ARM operations themselves (although they should typically be able to read
the parameter more directly).
The exact value is unlikely to be critical, but a sensible value may depend
on both the ARM and external factors such as memory bus speed.
-- TLB_InvalidateAll
The TLB or TLBs are to be globally invalidated.
entry: -
exit: -
IRQs are enabled
call is not reentrant
-- TLB_InvalidateEntry
The TLB or TLBs are to be invalidated for the entry at the given logical
address.
entry: r0 = logical address of entry to invalidate (page aligned)
exit: -
IRQs are enabled
call is not reentrant
The address will always be page aligned (4k).
-- WriteBuffer_Drain
Any writebuffers are to be drained so that any pending writes are guaranteed
completed to memory.
entry: -
exit: -
IRQs are enabled
call is not reentrant
-- IMB_Full
A global instruction memory barrier (IMB) is to be performed.
entry: -
exit: -
IRQs are enabled
call is not reentrant
An IMB is an operation that should be performed after new instructions have
been stored and before they are executed. It guarantees correct operation
for code modification (eg. something as simple as loading code to be
executed).
On some ARMs, this operation may be null. On ARMs with harvard architecture
this typically consists of:
1) clean data cache
2) drain write buffer
3) invalidate instruction cache
There may be other considerations such as invalidating branch prediction
caches.
-- IMB_Range
An instruction memory barrier (IMB) is to be performed over a logical
address range.
entry: r0 = logical address of start of range
r1 = logical address of end of range (exclusive)
Note that r0 and r1 are aligned on cache line boundaries
exit: -
IRQs are enabled
call is not reentrant
An IMB is an operation that should be performed after new instructions have
been stored and before they are executed. It guarantees correct operation
for code modification (eg. something as simple as loading code to be
executed).
On some ARMs, this operation may be null. On ARMs with harvard architecture
this typically consists of:
1) clean data cache over the range
2) drain write buffer
3) invalidate instruction cache over the range
There may be other considerations such as invalidating branch prediction
caches.
Note that the range may be very large. The implementation of this call is
typically expected to use a threshold (related to Cache_RangeThreshold) to
decide when to perform IMB_Full instead, being faster for large ranges.
-- MMU_Changing
The global MMU mapping is about to be changed.
entry: -
exit: -
IRQs are enabled
call is not reentrant
The operation must typically perform the following:
1) globally clean and invalidate all caches
2) drain write buffer
3) globally invalidate TLB or TLBs
Note that it should not be necessary to disable IRQs. The OS ensures that
remappings do not affect currently active interrupts.
-- MMU_ChangingEntry
The MMU mapping is about to be changed for a single page entry (4k).
entry: r0 = logical address of entry (page aligned)
exit: -
IRQs are enabled
call is not reentrant
The operation must typically perform the following:
1) clean and invalidate all caches over the 4k range of the page
2) drain write buffer
3) invalidate TLB or TLBs for the entry
Note that it should not be necessary to disable IRQs. The OS ensures that
remappings do not affect currently active interrupts.
-- MMU_ChangingUncached
The MMU mapping is about to be changed in a way that globally affects
uncacheable space.
entry: -
exit: -
IRQs are enabled
call is not reentrant
The operation must typically globally invalidate the TLB or TLBs. The OS
guarantees that cacheable space is not affected, so cache operations are not
required. However, there may still be considerations such as fill buffers
that operate in uncacheable space on some ARMs.
-- MMU_ChangingUncachedEntry
The MMU mapping is about to be changed for a single uncacheable page entry
(4k).
entry: r0 = logical address of entry (page aligned)
exit: -
IRQs are enabled
call is not reentrant
The operation must typically invalidate the TLB or TLBs for the entry. The
OS guarantees that cacheable space is not affected, so cache operations are
not required. However, there may still be considerations such as fill
buffers that operate in uncacheable space on some ARMs.
-- MMU_ChangingEntries
The MMU mapping is about to be changed for a contiguous range of page
entries (multiple of 4k).
entry: r0 = logical address of first page entry (page aligned)
r1 = number of page entries ( >= 1)
exit: -
IRQs are enabled
call is not reentrant
The operation must typically perform the following:
1) clean and invalidate all caches over the range of the pages
2) drain write buffer
3) invalidate TLB or TLBs over the range of the entries
Note that it should not be necessary to disable IRQs. The OS ensures that
remappings do not affect currently active interrupts.
Note that the number of entries may be large. The operation is typically
expected to use a reasonable threshold, above which it performs a global
operation instead for speed reasons.
-- MMU_ChangingUncachedEntries
The MMU mapping is about to be changed for a contiguous range of uncacheable
page entries (multiple of 4k).
entry: r0 = logical address of first page entry (page aligned)
r1 = number of page entries ( >= 1)
exit: -
IRQs are enabled
call is not reentrant
The operation must typically invalidate the TLB or TLBs over the range of
the entries. The OS guarantees that cacheable space is not affected, so
cache operations are not required. However, there may still be
considerations such as fill buffers that operate in uncacheable space on
some ARMs.
Note that the number of entries may be large. The operation is typically
expected to use a reasonable threshold, above which it performs a global
operation instead for speed reasons.
This diff is collapsed.
......@@ -13,12 +13,12 @@
GBLS Module_ComponentPath
Module_MajorVersion SETS "5.35"
Module_Version SETA 535
Module_MinorVersion SETS "4.79.2.14"
Module_Date SETS "09 Jan 2001"
Module_ApplicationDate2 SETS "09-Jan-01"
Module_ApplicationDate4 SETS "09-Jan-2001"
Module_MinorVersion SETS "4.79.2.15"
Module_Date SETS "12 Jan 2001"
Module_ApplicationDate2 SETS "12-Jan-01"
Module_ApplicationDate4 SETS "12-Jan-2001"
Module_ComponentName SETS "Kernel"
Module_ComponentPath SETS "RiscOS/Sources/Kernel"
Module_FullVersion SETS "5.35 (4.79.2.14)"
Module_HelpVersion SETS "5.35 (09 Jan 2001) 4.79.2.14"
Module_FullVersion SETS "5.35 (4.79.2.15)"
Module_HelpVersion SETS "5.35 (12 Jan 2001) 4.79.2.15"
END
......@@ -4,19 +4,19 @@
*
*/
#define Module_MajorVersion_CMHG 5.35
#define Module_MinorVersion_CMHG 4.79.2.14
#define Module_Date_CMHG 09 Jan 2001
#define Module_MinorVersion_CMHG 4.79.2.15
#define Module_Date_CMHG 12 Jan 2001
#define Module_MajorVersion "5.35"
#define Module_Version 535
#define Module_MinorVersion "4.79.2.14"
#define Module_Date "09 Jan 2001"
#define Module_MinorVersion "4.79.2.15"
#define Module_Date "12 Jan 2001"
#define Module_ApplicationDate2 "09-Jan-01"
#define Module_ApplicationDate4 "09-Jan-2001"
#define Module_ApplicationDate2 "12-Jan-01"
#define Module_ApplicationDate4 "12-Jan-2001"
#define Module_ComponentName "Kernel"
#define Module_ComponentPath "RiscOS/Sources/Kernel"
#define Module_FullVersion "5.35 (4.79.2.14)"
#define Module_HelpVersion "5.35 (09 Jan 2001) (4.79.2.14)"
#define Module_FullVersion "5.35 (4.79.2.15)"
#define Module_HelpVersion "5.35 (12 Jan 2001) (4.79.2.15)"
......@@ -2392,6 +2392,7 @@ MMUControl_Flush
TST r0,#&80000000
BEQ MMUC_flush_flushT
ARMop Cache_CleanInvalidateAll,,,r1
LDR r0, [sp]
MMUC_flush_flushT
TST r0,#&40000000
BEQ MMUC_flush_done
......
......@@ -168,15 +168,12 @@ VduInit ROUT
STR r0, [r4, #HWPixelFormats]
mjsCallHAL HAL_Video_Features
STR r0, [r4, #HWVideoFeatures]
mjsCallHAL HAL_Video_Features
STR r0, [r4, #HWPixelFormats]
mjsCallHAL HAL_Video_BufferAlignment
STR r0, [r4, #HWBufferAlign]
Pull "r4, r9, r12"
;;; sort this out!
! 0, "mjsHAL not doing anything useful with HAL_Video_PixelFormats"
! 0, "mjsHAL not doing anything useful with HAL_Video_bufferAlign"
! 0, "mjsHAL not doing anything useful with HAL_Video_BufferAlignment"
! 0, "mjsHAL not dealing with lack of h/w pointer"
LDR R0, =RangeC+SpriteReason_SwitchOutputToSprite
......@@ -607,6 +604,75 @@ CursorNbitTab
& Cursor16bit-CursorNbitTab
& Cursor32bit-CursorNbitTab
; table of susbstitute mode numbers to cater for hardware that might
; not support all of 1,2,4,8 bpp (bits per pixel) modes
;
; indexed by mode number (0..49), pairs of byte values:
; bpp = bits per pixel of this mode number
; promo = promoted mode number (0..49), or &FF if none
;
; promoted number is:
; 1) same resolution at next higher bpp (up to 8), if available, or
; 2) similar resolution at 8 bpp (8 bpp should be available on most h/w)
;
ModePromoTable
;
; bpp promo mode no.
;
DCB 1, 8 ; 0
DCB 2, 9 ; 1
DCB 4, 10 ; 2
DCB 1, 15 ; 3
DCB 1, 1 ; 4
DCB 2, 2 ; 5
DCB 1, 13 ; 6
DCB 4, 13 ; 7
DCB 2, 12 ; 8
DCB 4, 13 ; 9
DCB 8, &FF ; 10
DCB 2, 14 ; 11
DCB 4, 15 ; 12
DCB 8, &FF ; 13
DCB 4, 15 ; 14
DCB 8, &FF ; 15
DCB 4, 24 ; 16
DCB 4, 24 ; 17
DCB 1, 19 ; 18
DCB 2, 20 ; 19
DCB 4, 21 ; 20
DCB 8, &FF ; 21
DCB 4, 36 ; 22
DCB 1, 28 ; 23
DCB 8, &FF ; 24
DCB 1, 26 ; 25
DCB 2, 27 ; 26
DCB 4, 28 ; 27
DCB 8, &FF ; 28
DCB 1, 30 ; 29
DCB 2, 31 ; 30
DCB 4, 32 ; 31
DCB 8, &FF ; 32
DCB 1, 34 ; 33
DCB 2, 35 ; 34
DCB 4, 36 ; 35
DCB 8, &FF ; 36
DCB 1, 38 ; 37
DCB 2, 39 ; 38
DCB 4, 40 ; 39
DCB 8, &FF ; 40
DCB 1, 42 ; 41
DCB 2, 43 ; 42
DCB 4, 28 ; 43
DCB 1, 45 ; 44
DCB 2, 46 ; 45
DCB 4, 15 ; 46
DCB 8, &FF ; 47
DCB 4, 49 ; 48
DCB 8, &FF ; 49
;
ALIGN
; *****************************************************************************
;
; SYN - Perform MODE change
......@@ -634,6 +700,39 @@ VduBadExit ; jumped to if an error in VDU code
ModeChangeSub ROUT
Push lr
;If its a common mode number (0..49) consider a possible mode number
;substitution, if hardware does not support given bits per pixel.
;We are vaguely assuming h/w supports at least 8 bpp, otherwise we may
;not be able to find a usable mode number, and later code may not handle
;that well. This is probably ok, 8 bpp is almost universal.
;
CMP r2, #256
BHS mchsub_3
AND r1, r2, #&7F
CMP r1, #50 ; mode number
BHS mchsub_3
Push "r3, r4"
ADR lr, ModePromoTable ; table of mode promotions
LDR r4, [WsPtr, #HWPixelFormats] ; bits 0 to 3 set for 1,2,4,8 bpp supported
mchsub_1
MOV r1, r1, LSL #1
LDRB r3, [lr, r1] ; bpp for this mode number (1,2,4,8)
TST r3, r4 ; supported in h/w?
ANDNE r2, r2, #&80 ; if yes, take mode number that passed
ORRNE r2, r2, r1, LSR #1
BNE mchsub_2
ADD r1, r1, #1 ; else look for promotion
LDRB r1, [lr, r1] ; new mode number
CMP r1, #&FF ; &FF if none
BNE mchsub_1
;alright, dont panic, just try to get a VGA-like mode of any bpp, if not tried already
CMP r1, #28 ; VGA 8 bpp
MOVNE r1, #25 ; VGA 1 bpp
BNE mchsub_1
mchsub_2
Pull "r3, r4"
;
mchsub_3
MOV R1, #Service_PreModeChange
IssueService
TEQ R1, #0 ; was service claimed ?
......
......@@ -783,23 +783,25 @@ FindOKMode ROUT
BNE %FT05
; service claimed
; mjs Kernel/HAL split
; call HAL vetting routine to possibly adjust parameters (or if desperate, to disallow mode)
;;;mjsHAL - is the mode workspace suitably generic to be passed to HAL?
; int HAL_VetMode(void *VIDClist, void *workspace)
;
; VIDClist -> generic video controller list (VIDC list type 3)
; workspace -> mode workspace (if mode number), or 0
; returns 0 if OK (may be minor adjusts to VIDClist and/or workspace values)
; non-zero if not OK
;
; mjs Kernel/HAL split
; call HAL vetting routine to possibly disallow mode
;
Push "r0-r3, r9, r12"
MOV r0,r3
MOV r1,r4
;we'll do the vet on whether h/w supports the pixel depth ourselves
LDR r2,[r0,#VIDCList3_PixelDepth]
MOV r3,#1
MOV r3,r3,LSL r2 ; bits per pixel
LDR r2,[WsPtr,#HWPixelFormats]
TST r3,r2
MOVEQ r0,#1
BEQ %FT04 ; not supported
;now any vet the HAL might want to do
mjsAddressHAL
mjsCallHAL HAL_Video_VetMode
04
CMP r0,#0
Pull "r0-r3,r9,r12"
BNE %FT05 ; HAL says "Oi, Kernel, No!"
......@@ -921,6 +923,13 @@ FindSubstitute Entry
ADD r13, r13, #PushedInfoSize
CMP r11, #4
MOVCS r11, #0
Push "r2, r3"
LDR r2, [WsPtr, #HWPixelFormats] ; see if h/w supports this BPP
MOV r3, #1
MOV r3, r3, LSL r11
TST r2, r3
MOVEQ r11, #3 ; if not, use 8 BPP (assumed best chance for a mode number)
Pull "r2, r3"
LDRB r1, [r1, r11]
CLRV
EXIT
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment