1. 07 Nov, 2018 1 commit
    • Jeffrey Lee's avatar
      Attempt to tidy up substitute screen mode selection logic · d87c22a4
      Jeffrey Lee authored
      Detail:
        Over the years the OS's substitute screen mode selection logic has grown to be a tangled mess, and the logic it does implement isn't always very useful. Additionally, the kernel is structured in such a way that it can be hard for modules to override it.
        This set of changes aims to fix the many of the problems, by doing the following:
        - Moving all substitute mode selection logic out of the core VDU driver code and into a Service_ModeTranslation handler. This means you now only have one place in the kernel to look instead of several, and modules can override the behaviour by claiming/blocking the service call as appropriate.
        - Moving handling of the built-in VIDC lists out of the core VDU driver code and into a Service_ModeExtension handler. This means programs can now inspect these VIDC lists by issuing the right service call (although you are essentially limited to lists which the GraphicsV driver is OK with)
        - Moving *TV interlace & offset adjustment logic into the Service_ModeExtension handler, since they're legacy things which can be handled more cleanly for MDF/EDID (and the old code was poking memory the kernel didn't own)
        - Adding a Service_EnumerateScreenModes implementation, so that if you end up in the desktop with ScreenModes non-functional, the display manager at least has something useful to show you
        - Enhancing the handling of the built-in numbered modes so that they are now available in any colour depth; the Service_ModeExtension handler (and related handlers) treat the builtin VIDC lists as a set of mode timings, not a discrete set of modes
        - Substitute mode selection logic is a complete re-write. Instead of trying a handful of numbered fallback modes, it now tries:
          - Same mode but at higher colour depths
          - Same mode but at lower colour depths
          - Alternate resolutions (half-width mode with no double-pixel if original request was for double-pixel, and default resolution for monitor type)
        - Combined with the logic to allow the builtin VIDC lists to be used at any colour depth, this means that the kernel should now be able to find substitute modes for machines which lack support for <=8bpp modes (e.g. OMAP5)
        - Additionally the mode substitution code will attempt to retain as many properties of the originally requested mode as possible (eigen values, gap mode type, etc.)
        Other improvements:
        - The kernel now actually vets the builtin VIDC lists instead of assuming that they'll work (which also means they'll have the correct ExtraBytes value, where applicable)
        - The kernel now uses GraphicsV 19 (VetMode2) to vet the mode during the mode switch process, using the result to detect where the framebuffer will be placed. This allows for GraphicsV drivers to switch between DA 2 and external framestores on a per-mode basis.
        - The kernel now supports mode selectors which specify LineLength values which are larger than necessary; this will get translated to a suitable ExtraBytes control list item (+ combined with whatever padding the driver indicates is necessary via the VetMode2 result)
        File changes:
        - hdr/KernelWS - Reserve space for a VIDC list, since the Service_ModeExtension implementation typically can't use the built-in list as-is
        - s/Arthur3 - Issue Service_ModeFileChanged when the configured monitor type is changed, so that DisplayManager + friends are aware that the set of available modes has changed
        - s/GetAll - Fiddle with GETs a bit
        - s/MemMap2 - Extra LTORG
        - s/NewIRQs - Small routine to install/uninstall false VSync routine (previously from PushModeInfo, which wasn't really the appropriate place for it)
        - s/Utility - Hook up the extra service call handlers
        - s/vdu/legacymodes - New file containing the new service call implementations, and some related code
        - s/vdu/vdudecl - Move mode workspace definition here, from vdumodes
        - s/vdu/vdudriver - Remove assorted bits of mode substitution code. Plug in new bits for calling GraphicsV 19 during mode set, and deal with ExtraBytes/LineLength during PushModeInfo
        - s/vdu/vdumodes - Move some workspace definitions to s/vdu/vdudecl. Tweak how the builtin VIDC lists are stored.
        - s/vdu/vduswis - Rip out more mode substitution code. Issue Service_ModeFileChanged when monitor type is changed by OS_ScreenMode.
      Admin:
        Tested on Raspberry Pi 3, Iyonix, IGEPv5
      
      
      Version 6.14. Tagged as 'Kernel-6_14'
      d87c22a4
  2. 03 Sep, 2017 1 commit
    • Jeffrey Lee's avatar
      Fix global OS_SynchroniseCodeAreas. ARMop tweaks. · e6c4e3c2
      Jeffrey Lee authored
      Detail:
        s/ExtraSWIs - Fix global OS_SynchroniseCodeAreas using the wrong appspace size; would have resulted in appspace only being partially synced if some pages were mapped out due to lazy swapping
        s/ARMops, s/ExtraSWIs, s/MemMap2 - Simplify code by making DCache_LineLen / ICache_LineLen store the actual line length values on ARMv7+ instead of the log2 values. Optimise SMP I-cache invalidation by allowing it to do a global invalidate. Ensure all ARMv7+ range checks use LO instead of NE, to avoid any problems with mismatched I/D line lengths (can't be sure the op range was rounded to the larger of the two)
      Admin:
        Tested on iMX6
      
      
      Version 5.88, 4.129.2.5. Tagged as 'Kernel-5_88-4_129_2_5'
      e6c4e3c2
  3. 13 Dec, 2016 4 commits
    • Jeffrey Lee's avatar
      Implement support for cacheable pagetables · 65fa6a28
      Jeffrey Lee authored
      Detail:
        Modern ARMs (ARMv6+) introduce the possibility for the page table walk hardware to make use of the data cache(s) when performing memory accesses. This can significantly reduce the cost of a TLB miss on the system, and since the accesses are cache-coherent with the CPU it allows us to make the page tables cacheable for CPU (program) accesses also, improving the performance of page table manipulation by the OS.
        Even on ARMs where the page table walk can't use the data cache, it's been measured that page table manipulation operations can still benefit from placing the page tables in write-through or bufferable memory.
        So with that in mind, this set of changes updates the OS to allow cacheable/bufferable page tables to be used by the OS + MMU, using a system-appropriate cache policy.
        File changes:
        - hdr/KernelWS - Allocate workspace for storing the page flags that are to be used by the page tables
        - hdr/OSMem - Re-specify CP_CB_AlternativeDCache as having a different behaviour on ARMv6+ (inner write-through, outer write-back)
        - hdr/Options - Add CacheablePageTables option to allow switching back to non-cacheable page tables if necessary. Add SyncPageTables var which will be set {TRUE} if either the OS or the architecture requires a DSB after writing to a faulting page table entry.
        - s/ARM600, s/VMSAv6 - Add new SetTTBR & GetPageFlagsForCacheablePageTables functions. Update VMSAv6 for wider XCBTable (now 2 bytes per element)
        - s/ARMops - Update pre-ARMv7 MMU_Changing ARMops to drain the write buffer on entry if cacheable pagetables are in use (ARMv7+ already has this behaviour due to architectural requirements). For VMSAv6 Normal memory, change the way that the OS encodes the cache policy in the page table entries so that it's more compatible with the encoding used in the TTBR.
        - s/ChangeDyn - Update page table page flag handling to use PageTable_PageFlags. Make use of new PageTableSync macro.
        - s/Exceptions, s/AMBControl/memmap - Make use of new PageTableSync macro.
        - s/HAL - Update MMU initialisation sequence to make use of PageTable_PageFlags + SetTTBR
        - s/Kernel - Add PageTableSync macro, to be used after any write to a faulting page table entry
        - s/MemInfo - Update OS_Memory 0 page flag conversion. Update OS_Memory 24 to use new symbol for page table access permissions.
        - s/MemMap2 - Use PageTableSync. Add routines to enable/disable cacheable pagetables
        - s/NewReset - Enable cacheable pagetables once we're fully clear of the MMU initialision sequence (doing earlier would be trickier due to potential double-mapping)
      Admin:
        Tested on pretty much everything currently supported
        Delivers moderate performance benefits to page table ops on old systems (e.g. 10% faster), astronomical benefits on some new systems (up to 8x faster)
        Stats: https://www.riscosopen.org/forum/forums/3/topics/2728?page=2#posts-58015
      
      
      Version 5.71. Tagged as 'Kernel-5_71'
      65fa6a28
    • Jeffrey Lee's avatar
      Make MMU_Changing ARMops perform the sub-operations in a sensible order · 9a96263a
      Jeffrey Lee authored
      Detail:
        For a while we've known that the correct way of doing cache maintenance on ARMv6+ (e.g. when converting a page from cacheable to non-cacheable) is as follows:
        1. Write new page table entry
        2. Flush old entry from TLB
        3. Clean cache + drain write buffer
        The MMU_Changing ARMops (e.g. MMU_ChangingEntry) implement the last two items, but in the wrong order. This has caused the operations to fall out of favour and cease to be used, even in pre-ARMv6 code paths where the effects of improper cache/TLB management perhaps weren't as readily visible.
        This change re-specifies the relevant ARMops so that they perform their sub-operations in the correct order to make them useful on modern ARMs, updates the implementations, and updates the kernel to make use of the ops whereever relevant.
        File changes:
        - Docs/HAL/ARMop_API - Re-specify all the MMU_Changing ARMops to state that they are for use just after a page table entry has been changed (as opposed to before - e.g. 5.00 kernel behaviour). Re-specify the cacheable ones to state that the TLB invalidatation comes first.
        - s/ARM600, s/ChangeDyn, s/HAL, s/MemInfo, s/VMSAv6, s/AMBControl/memmap - Replace MMU_ChangingUncached + Cache_CleanInvalidate pairs with equivalent MMU_Changing op
        - s/ARMops - Update ARMop implementations to do everything in the correct order
        - s/MemMap2 - Update ARMop usage, and get rid of some lingering sledgehammer logic from ShuffleDoublyMappedRegionForGrow
      Admin:
        Tested on pretty much everything currently supported
      
      
      Version 5.70. Tagged as 'Kernel-5_70'
      9a96263a
    • Jeffrey Lee's avatar
      Place restrictions on the use of cacheable doubly-mapped DAs · 2704c756
      Jeffrey Lee authored
      Detail:
        The kernel has always allowed software to create cacheable doubly-mapped DAs, despite the fact that the VIVT caches used on ARMv5 and below would have no way of keeping both of the mappings coherent
        This change places restrictions the following restrictions on doubly-mapped areas, to ensure that cache settings which can't be supported by the cache architecture of the CPU can't be selected:
        * On ARMv6 and below, cacheable doubly-mapped areas aren't supported.
          * Although ARMv6 has VIPT data caches, it's also subject to page colouring constraints which would require us to force the DA size to be a multiple of 16k. So for now keep things simple and disallow cacheable doubly-mapped areas on ARMv6.
        * On ARMv7 and above, cacheable doubly-mapped areas are allowed, but only if they are marked non-executable
          * The blocker to allowing executable cacheable doubly-mapped areas are the VIPT instruction caches; OS_SynchroniseCodeAreas (or callers of it) would need to know that a doubly-mapped area is in use so that they can flush both mappings from the I-cache. Although some chips do have PIPT instruction caches, again it isn't really worth supporting executable cacheable doubly-mapped areas at the moment.
        These changes also allow us to get rid of the expensive 'sledgehammer' logic when dealing with doubly-mapped areas
        File changes:
        - s/ARM600, s/VMSAv6 - Remove the sledgehammer logic, only perform cache/TLB maintenance for the required areas
        - s/ChangeDyn - Implement the required checks
        - s/MemMap2 - Move some cache maintenance logic into RemoveCacheabilityR0ByMinusR2, which previously would have had to be performed by the caller due to the sledgehammer paranoia
      Admin:
        Cacheable doubly-mapped DAs tested on iMx6 (tried making screen memory write-through cacheable; decent performance gain seen)
        Note OS_Memory 0 "make temporarily uncacheable" doesn't work on doubly-mapped areas, so cacheable doubly-mapped areas are not yet safe for general DMA
      
      
      Version 5.69. Tagged as 'Kernel-5_69'
      2704c756
    • Jeffrey Lee's avatar
      Make s/ChangeDyn slightly more readable by splitting some routines out into a separate file · 4a6150dc
      Jeffrey Lee authored
      Detail:
        s/MemMap2 - New file containing assorted low-level memory mapping routines taken from s/ChangeDyn. N.B. There's no special significance to this being named "MemMap2", it's just a name that stuck due to some earlier (abandoned) changes which added a file named "MemMap".
        s/ChangeDyn - Remove the routines/chunks of code that were moved to s/MemMap2. Also some duplicate code removal (Regular DA grow code and DoTheGrowNotSpecified are now rely on the new DoTheGrowCommon routine for doing the actual grow)
        s/GetAll - GET s/MemMap2 at an appropriate time
      Admin:
        Tested on pretty much everything currently supported
      
      
      Version 5.67. Tagged as 'Kernel-5_67'
      4a6150dc