kernel now attempts to substitute video mode numbers in face of h/w with...

kernel now attempts to substitute video mode numbers in face of h/w with limited bits-per-pixel support (not tested yet) HAL_API document added - early draft only, of interest to those writing or modifying HALs for new h/w ARMop_API document added - early draft only, of interest only to those modifying kernel to support new ARM cores *** polite comments on HAL_API welcome *** Version 5.35, 4.79.2.15. Tagged as 'Kernel-5_35-4_79_2_15'

kernel now attempts to substitute video mode numbers in face of h/w with...
kernel now attempts to substitute video mode numbers in face of h/w with limited bits-per-pixel support (not tested yet) HAL_API document added - early draft only, of interest to those writing or modifying HALs for new h/w ARMop_API document added - early draft only, of interest only to those modifying kernel to support new ARM cores *** polite comments on HAL_API welcome *** Version 5.35, 4.79.2.15. Tagged as 'Kernel-5_35-4_79_2_15'
6a293f53 · Mike Stephens · cdf980ed · 6a293f53 · 6a293f53 · 6a293f53
Commit 6a293f53 authored 24 years ago by Mike Stephens
7 changed files
--- a/Docs/HAL/ARMop_API
+++ b/Docs/HAL/ARMop_API
+12345678901234567890123456789012345678901234567890123456789012345678901234567890
+
+mjs   12 Jan 2001   Early Draft
+
+
+RISC OS Kernel ARM core support
+===============================
+
+This document is concerned with the design of open ended support for
+multiple ARM cores within the RISC OS kernel, as part of the work loosely
+termed hardware abstraction. Note that the ARM core support is part of the
+OS kernel, and so is not part of the hardware abstraction layer (HAL)
+itself.
+
+Background
+----------
+
+ARM core support (including caches and MMU) has historically been coded in a
+tailored way for one or two specific variants. Since version 3.7 this has
+meant just two variants; ARM 6/7 and StrongARM SA110. A more generic
+approach is required for the next generation. This aims both to support
+several cores in a more structured way, and to cover minor variants (eg.
+cache size) with the same support code. The natural approach is to set up
+run-time vectors to a set of ARM support routines.
+
+Note that it is currently assumed that the ARM MMU architecture will not
+change radically in future ARM cores. Hence, the kernel memory management
+algorithms remain largely unchanged. This is believed to be a reasonable
+assumption, since the last major memory management change was with Risc PC
+and ARM 610 (when the on-chip MMU was introduced).
+
+Note that all ARM support code must be 32-bit clean, as part of the 32-bit
+clean kernel.
+
+Survey of ARM core requirements
+-------------------------------
+
+At present, five broad ARM core types can be considered to be of interest;
+ARM7 (and ARM6), ARM9, ARM10, StrongARM (SA1) and  XScale. These divide
+primarily in terms of cache types, and cache and TLB maintenance
+requirements. They also span a range of defined ARM architecture variants,
+which introduced variants for system operations (primarily coprocessor 15
+instructions).
+
+The current ARM architecture is version 5. This (and version 4) has some
+open ended definitions to allow code to determine cache size and types from
+CP15 registers. Hence, the design of the support code can hope to be at
+least tolerant of near future variations that are introduced.
+
+ARM7
+----
+
+ARM7 cores may be architecture 3 or 4. They differ in required coprocessor
+15 operations for the same cache and TLB control. ARM6 cores are much the
+same as architecture 3 ARM7. The general character of all these cores is of
+unified write-through caches that can only be invalidated on a global basis.
+The TLBs are also unified, and can be invalidated per entry or globally.
+
+ARM9
+----
+
+ARM9 cores are architecture 4. We ignore ARM9 variants without an MMU. The
+kernel can read cache size and features. The ARM 920 or 922 have harvard
+caches, with writeback and writethrough capable data caches (on a page or
+section granularity). Data and instruction caches can be invalidated by
+individual lines or globally. The data cache can be cleaned by virtual
+address or cache segment/index, allowing for efficient cache maintenance.
+Data and instruction TLBs can be invalidated by entry or globally.
+
+ARM10
+-----
+
+ARM 10 is architecture 5. Few details available at present. Likely to be
+similar to ARM9 in terms of cache features and available operations. 
+
+StrongARM
+---------
+
+StrongARM is architecture 4. StrongARMs have harvard caches, the data cache
+being writeback only (no writethrough option). The data cache can only be
+globally cleaned in an indirect manner, by reading from otherwise unused
+address space. This is inefficient because it requires external (to the
+core) reads on the bus. In particular, the minimum cost of a clean, for a
+nearly clean cache, is high. The data cache supports clean and invalidate by
+individual virtual lines, so this is reasonably efficient for small ranges
+of address. The data TLB can be invalidated by entry or globally.
+
+The instruction cache can only be invalidated globally. This is inefficient
+for cases such as IMBs over a small range (dynamic code). The instruction
+TLB can only be invalidated globally.
+
+Some StrongARM variants have a mini data cache. This is selected over the
+main cache on a section or page by using the cachable/bufferable bits set to
+C=1,B=0 in the MMU (this is not standard ARM architecture). The mini data
+cache is writeback and must be cleaned in the same manner as the main data
+cache.
+
+XScale
+------
+
+XScale is architecture 5. It implements harvard caches, the data cache being
+writeback or writethrough (on a page or section granularity). Data and
+instruction caches can be invalidated by individual lines or globally. The
+data cache can be fully cleaned by allocating lines from otherwise unused
+address space. Unlike StrongARM, no external reads are needed for the clean
+operation, so that cache maintenance is efficient.
+
+XScale has a mini data cache. This is only available by using extension bits
+in the MMU. This extension is not documented in the current manual for
+architecture 5, but will presumably be properly recognised by ARM. It should
+be a reasonably straightforward extension for RISC OS. The mini data cache
+can only be cleaned by inefficient indirect reads as on StrongARM. However,
+for XScale, the whole mini data cache can be configured as writethrough to
+obviate this problem. The most likely use for RISC OS is to map screen
+memory as mini cacheable, when writethrough caching will also be highly
+desirable to prevent delayed screen update.
+
+The instruction and data TLBs can each be invalidated by entry or globally.
+
+
+Kernel ARM operations
+---------------------
+
+This section lists the definitions and API of the set of ARM operations
+required by the kernel for each major ARM type that is to be supported. Some
+operations may be very simple on some ARMs. Others may need support from the
+kernel environment - for example, readable parameters that have been
+determined at boot, or address space available for cache clean operations.
+
+The general rules for register usage and preservation in calling these
+operations is:
+
+  - any parameters are passed in r0,r1 etc. as required
+  - r0 may be used as a scratch register
+  - the routines see a valid stack via sp, at least 16 words are available
+  - lr is the return link as required
+  - on exit, all registers except r0 and lr must be preserved
+
+Note that where register values are given as logical addresses, these are
+RISC OS logical addresses. The equivalent ARM terminology is virtual address
+(VA), or modified virtual address (MVA) for architectures with the fast
+context switch extension.
+
+Note also that where cache invalidation is required, it is implicit that any
+associated operations for a particular ARM should be performed also. The
+most obvious example is for an ARM with branch prediction, where it may be
+necessary to invalidate a branch cache anywhere where instruction cache
+invalidation is to be performed.
+
+Any operation that is a null operation on the given ARM should be
+implemented as a single return instruction:
+
+  MOV pc, lr
+
+
+-- Cache_CleanInvalidateAll
+
+The cache or caches are to be globally invalidated, with cleaning of any
+writeback data being properly performed. 
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+Note that any write buffer draining should also be performed by this
+operation, so that memory is fully updated with respect to any writeaback
+data.
+
+The OS only expects the invalidation to be with respect to instructions/data
+that are not involved in any currently active interrupts. In other words, it
+is expected and desirable that interrupts remain enabled during any extended
+clean operation, in order to avoid impact on interrupt latency.
+
+-- Cache_CleanAll
+
+The unified cache or data cache are to be globally cleaned (any writeback data
+updated to memory). Invalidation is not required.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+Note that any write buffer draining should also be performed by this
+operation, so that memory is fully updated with respect to any writeaback
+data.
+
+The OS only expects the cleaning to be with respect to data that are not
+involved in any currently active interrupts. In other words, it is expected
+and desirable that interrupts remain enabled during any extended clean
+operation, in order to avoid impact on interrupt latency.
+
+-- Cache_InvalidateAll
+
+The cache or caches are to be globally invalidated. Cleaning of any writeback
+data is not to be performed.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+This call is only required for special restart use, since it implies that
+any writeback data are either irrelevant or not valid. It should be a very
+simple operation on all ARMs.
+
+-- Cache_RangeThreshold
+
+Return a threshold value for an address range, above which it is advisable
+to globally clean and/or invalidate caches, for performance reasons. For a
+range less than or equal to the threshold, a ranged cache operation is
+recommended.
+
+   entry: -
+   exit:  r0 = threshold value (bytes)
+
+   IRQs are enabled
+   call is not reentrant
+
+This call returns a value that the kernel may use to select between strategies
+in some cache operations. This threshold may also be of use to some of the
+ARM operations themselves (although they should typically be able to read
+the parameter more directly).
+
+The exact value is unlikely to be critical, but a sensible value may depend
+on both the ARM and external factors such as memory bus speed.
+
+
+-- TLB_InvalidateAll
+
+The TLB or TLBs are to be globally invalidated.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+-- TLB_InvalidateEntry
+
+The TLB or TLBs are to be invalidated for the entry at the given logical
+address.
+
+   entry: r0 = logical address of entry to invalidate (page aligned)
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The address will always be page aligned (4k).
+
+-- WriteBuffer_Drain
+
+Any writebuffers are to be drained so that any pending writes are guaranteed
+completed to memory.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+-- IMB_Full
+
+A global instruction memory barrier (IMB) is to be performed.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+An IMB is an operation that should be performed after new instructions have
+been stored and before they are executed. It guarantees correct operation
+for code modification (eg. something as simple as loading code to be
+executed).
+
+On some ARMs, this operation may be null. On ARMs with harvard architecture
+this typically consists of:
+
+  1) clean data cache
+  2) drain write buffer
+  3) invalidate instruction cache
+
+There may be other considerations such as invalidating branch prediction
+caches.
+
+-- IMB_Range
+
+An instruction memory barrier (IMB) is to be performed over a logical
+address range.
+
+   entry: r0 = logical address of start of range
+          r1 = logical address of end of range (exclusive)
+          Note that r0 and r1 are aligned on cache line boundaries
+   exit: -
+
+   IRQs are enabled
+   call is not reentrant
+
+An IMB is an operation that should be performed after new instructions have
+been stored and before they are executed. It guarantees correct operation
+for code modification (eg. something as simple as loading code to be
+executed).
+
+On some ARMs, this operation may be null. On ARMs with harvard architecture
+this typically consists of:
+
+  1) clean data cache over the range
+  2) drain write buffer
+  3) invalidate instruction cache over the range
+
+There may be other considerations such as invalidating branch prediction
+caches.
+
+Note that the range may be very large. The implementation of this call is
+typically expected to use a threshold (related to Cache_RangeThreshold) to
+decide when to perform IMB_Full instead, being faster for large ranges.
+
+-- MMU_Changing
+
+The global MMU mapping is about to be changed.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically perform the following:
+
+  1) globally clean and invalidate all caches
+  2) drain write buffer
+  3) globally invalidate TLB or TLBs
+
+Note that it should not be necessary to disable IRQs. The OS ensures that
+remappings do not affect currently active interrupts.
+
+-- MMU_ChangingEntry
+
+The MMU mapping is about to be changed for a single page entry (4k).
+
+   entry: r0 = logical address of entry (page aligned)
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically perform the following:
+
+  1) clean and invalidate all caches over the 4k range of the page
+  2) drain write buffer
+  3) invalidate TLB or TLBs for the entry
+
+Note that it should not be necessary to disable IRQs. The OS ensures that
+remappings do not affect currently active interrupts.
+
+-- MMU_ChangingUncached
+
+The MMU mapping is about to be changed in a way that globally affects
+uncacheable space.
+
+   entry: -
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically globally invalidate the TLB or TLBs. The OS
+guarantees that cacheable space is not affected, so cache operations are not
+required. However, there may still be considerations such as fill buffers
+that operate in uncacheable space on some ARMs.
+
+-- MMU_ChangingUncachedEntry
+
+The MMU mapping is about to be changed for a single uncacheable page entry
+(4k).
+
+   entry: r0 = logical address of entry (page aligned)
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically invalidate the TLB or TLBs for the entry. The
+OS guarantees that cacheable space is not affected, so cache operations are
+not required. However, there may still be considerations such as fill
+buffers that operate in uncacheable space on some ARMs.
+
+
+-- MMU_ChangingEntries
+
+The MMU mapping is about to be changed for a contiguous range of page
+entries (multiple of 4k).
+
+   entry: r0 = logical address of first page entry (page aligned)
+          r1 = number of page entries ( >= 1)
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically perform the following:
+
+  1) clean and invalidate all caches over the range of the pages
+  2) drain write buffer
+  3) invalidate TLB or TLBs over the range of the entries
+
+Note that it should not be necessary to disable IRQs. The OS ensures that
+remappings do not affect currently active interrupts.
+
+Note that the number of entries may be large. The operation is typically
+expected to use a reasonable threshold, above which it performs a global
+operation instead for speed reasons.
+
+-- MMU_ChangingUncachedEntries
+
+The MMU mapping is about to be changed for a contiguous range of uncacheable
+page entries (multiple of 4k).
+
+   entry: r0 = logical address of first page entry (page aligned)
+          r1 = number of page entries ( >= 1)
+   exit:  -
+
+   IRQs are enabled
+   call is not reentrant
+
+The operation must typically invalidate the TLB or TLBs over the range of
+the entries. The OS guarantees that cacheable space is not affected, so
+cache operations are not required. However, there may still be
+considerations such as fill buffers that operate in uncacheable space on
+some ARMs.
+
+Note that the number of entries may be large. The operation is typically
+expected to use a reasonable threshold, above which it performs a global
+operation instead for speed reasons.
--- a/Docs/HAL/HAL_API
+++ b/Docs/HAL/HAL_API
--- a/VersionASM
+++ b/VersionASM
@@ -13,12 +13,12 @@
                        GBLS    Module_ComponentPath
 Module_MajorVersion     SETS    "5.35"
 Module_Version          SETA    535
-Module_MinorVersion     SETS    "4.79.2.14"
-Module_Date             SETS    "09 Jan 2001"
-Module_ApplicationDate2 SETS    "09-Jan-01"
-Module_ApplicationDate4 SETS    "09-Jan-2001"
+Module_MinorVersion     SETS    "4.79.2.15"
+Module_Date             SETS    "12 Jan 2001"
+Module_ApplicationDate2 SETS    "12-Jan-01"
+Module_ApplicationDate4 SETS    "12-Jan-2001"
 Module_ComponentName    SETS    "Kernel"
 Module_ComponentPath    SETS    "RiscOS/Sources/Kernel"
-Module_FullVersion      SETS    "5.35 (4.79.2.14)"
-Module_HelpVersion      SETS    "5.35 (09 Jan 2001) 4.79.2.14"
+Module_FullVersion      SETS    "5.35 (4.79.2.15)"
+Module_HelpVersion      SETS    "5.35 (12 Jan 2001) 4.79.2.15"
                        END
--- a/VersionNum
+++ b/VersionNum
@@ -4,19 +4,19 @@
 *
 */
 #define Module_MajorVersion_CMHG        5.35
-#define Module_MinorVersion_CMHG        4.79.2.14
-#define Module_Date_CMHG                09 Jan 2001
+#define Module_MinorVersion_CMHG        4.79.2.15
+#define Module_Date_CMHG                12 Jan 2001

 #define Module_MajorVersion             "5.35"
 #define Module_Version                  535
-#define Module_MinorVersion             "4.79.2.14"
-#define Module_Date                     "09 Jan 2001"
+#define Module_MinorVersion             "4.79.2.15"
+#define Module_Date                     "12 Jan 2001"

-#define Module_ApplicationDate2         "09-Jan-01"
-#define Module_ApplicationDate4         "09-Jan-2001"
+#define Module_ApplicationDate2         "12-Jan-01"
+#define Module_ApplicationDate4         "12-Jan-2001"

 #define Module_ComponentName            "Kernel"
 #define Module_ComponentPath            "RiscOS/Sources/Kernel"

-#define Module_FullVersion              "5.35 (4.79.2.14)"
-#define Module_HelpVersion              "5.35 (09 Jan 2001) (4.79.2.14)"
+#define Module_FullVersion              "5.35 (4.79.2.15)"
+#define Module_HelpVersion              "5.35 (12 Jan 2001) (4.79.2.15)"
--- a/s/ARM600
+++ b/s/ARM600
@@ -2392,6 +2392,7 @@ MMUControl_Flush
       TST      r0,#&80000000
       BEQ      MMUC_flush_flushT
       ARMop    Cache_CleanInvalidateAll,,,r1
+       LDR      r0, [sp]
 MMUC_flush_flushT
       TST      r0,#&40000000
       BEQ      MMUC_flush_done

--- a/s/vdu/vdudriver
+++ b/s/vdu/vdudriver
@@ -168,15 +168,12 @@ VduInit ROUT
        STR     r0, [r4, #HWPixelFormats]
        mjsCallHAL HAL_Video_Features
        STR     r0, [r4, #HWVideoFeatures]
-        mjsCallHAL HAL_Video_Features
-        STR     r0, [r4, #HWPixelFormats]
        mjsCallHAL HAL_Video_BufferAlignment
        STR     r0, [r4, #HWBufferAlign]
        Pull    "r4, r9, r12"

        ;;; sort this out!
-        ! 0, "mjsHAL not doing anything useful with HAL_Video_PixelFormats"
-        ! 0, "mjsHAL not doing anything useful with HAL_Video_bufferAlign"
+        ! 0, "mjsHAL not doing anything useful with HAL_Video_BufferAlignment"
        ! 0, "mjsHAL not dealing with lack of h/w pointer"

        LDR     R0, =RangeC+SpriteReason_SwitchOutputToSprite
@@ -607,6 +604,75 @@ CursorNbitTab
        &       Cursor16bit-CursorNbitTab
        &       Cursor32bit-CursorNbitTab

+; table of susbstitute mode numbers to cater for hardware that might
+; not support all of 1,2,4,8 bpp (bits per pixel) modes
+;
+; indexed by mode number (0..49), pairs of byte values:
+;   bpp    = bits per pixel of this mode number
+;   promo  = promoted mode number (0..49), or &FF if none
+;
+; promoted number is:
+;  1) same resolution at next higher bpp (up to 8), if available, or
+;  2) similar resolution at 8 bpp (8 bpp should be available on most h/w)
+;
+ModePromoTable
+;
+;          bpp promo       mode no.
+;
+      DCB    1,    8     ;  0
+      DCB    2,    9     ;  1
+      DCB    4,   10     ;  2
+      DCB    1,   15     ;  3
+      DCB    1,    1     ;  4
+      DCB    2,    2     ;  5
+      DCB    1,   13     ;  6
+      DCB    4,   13     ;  7
+      DCB    2,   12     ;  8
+      DCB    4,   13     ;  9
+      DCB    8,  &FF     ; 10
+      DCB    2,   14     ; 11
+      DCB    4,   15     ; 12
+      DCB    8,  &FF     ; 13
+      DCB    4,   15     ; 14
+      DCB    8,  &FF     ; 15
+      DCB    4,   24     ; 16
+      DCB    4,   24     ; 17
+      DCB    1,   19     ; 18
+      DCB    2,   20     ; 19
+      DCB    4,   21     ; 20
+      DCB    8,  &FF     ; 21
+      DCB    4,   36     ; 22
+      DCB    1,   28     ; 23
+      DCB    8,  &FF     ; 24
+      DCB    1,   26     ; 25
+      DCB    2,   27     ; 26
+      DCB    4,   28     ; 27
+      DCB    8,  &FF     ; 28
+      DCB    1,   30     ; 29
+      DCB    2,   31     ; 30
+      DCB    4,   32     ; 31
+      DCB    8,  &FF     ; 32
+      DCB    1,   34     ; 33
+      DCB    2,   35     ; 34
+      DCB    4,   36     ; 35
+      DCB    8,  &FF     ; 36
+      DCB    1,   38     ; 37
+      DCB    2,   39     ; 38
+      DCB    4,   40     ; 39
+      DCB    8,  &FF     ; 40
+      DCB    1,   42     ; 41
+      DCB    2,   43     ; 42
+      DCB    4,   28     ; 43
+      DCB    1,   45     ; 44
+      DCB    2,   46     ; 45
+      DCB    4,   15     ; 46
+      DCB    8,  &FF     ; 47
+      DCB    4,   49     ; 48
+      DCB    8,  &FF     ; 49
+;
+      ALIGN
+
+
 ; *****************************************************************************
 ;
 ;       SYN - Perform MODE change
@@ -634,6 +700,39 @@ VduBadExit                              ; jumped to if an error in VDU code
 ModeChangeSub ROUT
        Push    lr

+        ;If its a common mode number (0..49) consider a possible mode number
+        ;substitution, if hardware does not support given bits per pixel.
+        ;We are vaguely assuming h/w supports at least 8 bpp, otherwise we may
+        ;not be able to find a usable mode number, and later code may not handle
+        ;that well. This is probably ok, 8 bpp is almost universal.
+        ;
+        CMP     r2, #256
+        BHS     mchsub_3
+        AND     r1, r2, #&7F
+        CMP     r1, #50                      ; mode number
+        BHS     mchsub_3
+        Push    "r3, r4"
+        ADR     lr, ModePromoTable           ; table of mode promotions
+        LDR     r4, [WsPtr, #HWPixelFormats] ; bits 0 to 3 set for 1,2,4,8 bpp supported
+mchsub_1
+        MOV     r1, r1, LSL #1
+        LDRB    r3, [lr, r1]                 ; bpp for this mode number (1,2,4,8)
+        TST     r3, r4                       ; supported in h/w?
+        ANDNE   r2, r2, #&80                 ; if yes, take mode number that passed
+        ORRNE   r2, r2, r1, LSR #1
+        BNE     mchsub_2
+        ADD     r1, r1, #1                   ; else look for promotion
+        LDRB    r1, [lr, r1]                 ; new mode number
+        CMP     r1, #&FF                     ; &FF if none
+        BNE     mchsub_1
+        ;alright, dont panic, just try to get a VGA-like mode of any bpp, if not tried already
+        CMP     r1, #28                      ; VGA 8 bpp
+        MOVNE   r1, #25                      ; VGA 1 bpp
+        BNE     mchsub_1
+mchsub_2
+        Pull    "r3, r4"
+;
+mchsub_3
        MOV     R1, #Service_PreModeChange
        IssueService
        TEQ     R1, #0                  ; was service claimed ?

--- a/s/vdu/vduswis
+++ b/s/vdu/vduswis
@@ -783,23 +783,25 @@ FindOKMode ROUT
        BNE     %FT05

 ; service claimed
-; mjs Kernel/HAL split
-; call HAL vetting routine to possibly adjust parameters (or if desperate, to disallow mode)
-
-;;;mjsHAL - is the mode workspace suitably generic to be passed to HAL?

-        ; int HAL_VetMode(void *VIDClist, void *workspace)
-        ;
-        ; VIDClist  -> generic video controller list (VIDC list type 3)
-        ; workspace -> mode workspace (if mode number), or 0
-        ; returns 0 if OK (may be minor adjusts to VIDClist and/or workspace values)
-        ;         non-zero if not OK
-        ;
+; mjs Kernel/HAL split
+; call HAL vetting routine to possibly disallow mode
+;
        Push "r0-r3, r9, r12"
        MOV   r0,r3
        MOV   r1,r4
+        ;we'll do the vet on whether h/w supports the pixel depth ourselves
+        LDR   r2,[r0,#VIDCList3_PixelDepth]
+        MOV   r3,#1
+        MOV   r3,r3,LSL r2                 ; bits per pixel
+        LDR   r2,[WsPtr,#HWPixelFormats]
+        TST   r3,r2
+        MOVEQ r0,#1
+        BEQ   %FT04                        ; not supported
+        ;now any vet the HAL might want to do
        mjsAddressHAL
        mjsCallHAL    HAL_Video_VetMode
+04
        CMP   r0,#0
        Pull "r0-r3,r9,r12"
        BNE   %FT05         ; HAL says "Oi, Kernel, No!"
@@ -921,6 +923,13 @@ FindSubstitute Entry
        ADD     r13, r13, #PushedInfoSize
        CMP     r11, #4
        MOVCS   r11, #0
+        Push    "r2, r3"
+        LDR     r2, [WsPtr, #HWPixelFormats]    ; see if h/w supports this BPP
+        MOV     r3, #1
+        MOV     r3, r3, LSL r11
+        TST     r2, r3
+        MOVEQ   r11, #3                         ; if not, use 8 BPP (assumed best chance for a mode number)
+        Pull    "r2, r3"
        LDRB    r1, [r1, r11]
        CLRV
        EXIT