1. 28 Apr, 2021 3 commits
    • Jeffrey Lee's avatar
      Support runtime selection of pagetable format · ba993cb5
      Jeffrey Lee authored
      Runtime selection between long descriptor and short descriptor page
      table format is now possible (with the decision based on whether the HAL
      registers any high RAM or not). The main source changes are as follows:
      
      * LongDesc and ShortDesc switches are in hdr.Options to control what
      kernel variant is built
      * PTOp and PTWhich macros introduced in hdr.ARMops to allow for
      invocation of functions / code blocks which are specific to the page
      table format. If the kernel is being built with only one page table
      format enabled, PTOp is just a BL instruction, ensuring there's no
      performance loss compared to the old code.
      * _LongDesc and _ShortDesc suffixes added to various function names, to
      allow both versions of the function to be included at once if runtime
      selection is enabled
      * Most of the kernel / MMU initialisation code in s.HAL is now encased
      in a big WHILE loop, allowing it to be duplicated if runtime switching
      is enabled (easier than adding dynamic branches all over the place, and
      only costs a few KB of ROM/RAM)
      * Some more functions (notably AccessPhysicalAddress,
      ReleasePhysicalAddress, and MapInIO) have been moved to s.ShortDesc /
      s.LongDesc since they were already 90% specific to page table format
      ba993cb5
    • Jeffrey Lee's avatar
      Add Service_PagesUnsafe64 & PagesSafe64 · 15a7d5ee
      Jeffrey Lee authored
      These use a page block with a 64bit address fields (matching OS_Memory
      64). The page list(s) contain the full list of pages involved in the
      operation, unlike the 32bit PagesUnsafe / PagesSafe calls, which only
      list pages which have 32bit addresses. The kernel issues the service
      calls in the following order:
      
      1. Service_PagesUnsafe64
      2. Service_PagesUnsafe
      3. Service_PagesSafe
      4. Service_PagesSafe64
      
      Since only one PagesUnsafe operation can occur at a time, a program
      which supports both service calls can safely ignore the PagesUnsafe /
      PagesSafe calls if a PagesUnsafe64 operation is in progress (the
      PagesUnsafe call will only list a subset of the pages from the
      PagesUnsafe64 call). The 32bit PagesUnsafe / PagesSafe calls will be
      skipped if no 32bit pages are being replaced.
      
      The addition of these calls means that NeedsSpecificPages DAs (and PMPs)
      can now request pages which have large physical addresses.
      
      Note that the page replacement logic now has the restriction that pages
      which have 32bit physical addresses can only be replaced by other pages
      which have 32bit physical addresses. This is necessary to ensure that
      users of the old 32bit APIs see the page replacement take place. However
      it does mean that programs will be unable to claim pages of low RAM
      which are in use if there are not enough free low RAM pages in the free
      pool.
      
      A future optimisation would be to update the service calls so that they
      don't list required pages which are in the free pool; if all the
      required pages are in the free pool this would allow the service calls
      (and FIQ claiming) to be skipped completely.
      15a7d5ee
    • Jeffrey Lee's avatar
      Define OS_Memory 0 page block format · 7ddbbeed
      Jeffrey Lee authored
      Add to s.ChangeDyn a definition of the OS_Memory 0 page block format,
      and update all relevant code to use those definitions instead of
      hardcoded offsets.
      7ddbbeed
  2. 17 Mar, 2021 1 commit
    • Jeffrey Lee's avatar
      Initial long descriptor support · b51b5540
      Jeffrey Lee authored
      This adds initial support for the "long descriptor" MMU page table
      format, which allows the CPU to (flexibly) use a 40-bit physical address
      space.
      
      There are still some features that need fixing (e.g. RISCOS_MapInIO
      flags), and the OS doesn't yet support RAM above the 32bit limit, but
      this set of changes is enough to allow for working ROMs to be produced.
      
      Also, move MMUControlSoftCopy initialisation out of ClearWkspRAM, since
      it's unrelated to whether the HAL has cleared the RAM or not.
      b51b5540
  3. 13 Feb, 2021 3 commits
    • Jeffrey Lee's avatar
      More code uses logical_to_physical & physical_to_ppn · 9a82cb28
      Jeffrey Lee authored
      GetPageFlagsForR0IntoR6 & MoveCAMatR0toR3 changed to use
      logical_to_physical & physical_to_ppn, to reduce the number of routines
      which are performing direct page table access.
      9a82cb28
    • Jeffrey Lee's avatar
      Start moving page table code into s.ShortDesc · ca69793c
      Jeffrey Lee authored
      In preparation for the addition of long descriptor page table support,
      start moving low-level page table routines into their own file
      (s.ShortDesc) so that we can add a corresponding long descriptor
      implementation in the future.
      
      * logical_to_physical, MakePageTablesCacheable,
      MakePageTablesNonCacheable, AllocateBackingLevel2, AMB_movepagesin_L2PT,
      AMB_movecacheablepagesout_L2PT, AMB_moveuncacheablepagesout_L2PT
      routines, and PageNumToL2PT macros, all moved to s.ShortDesc with no
      changes.
      * Add new UpdateL1PTForPageReplacement routine (by splitting some code
      out of s.ChangeDyn)
      ca69793c
    • Jeffrey Lee's avatar
      Prepare logical_to_physical for 64bit phys addrs · 4fd2dd01
      Jeffrey Lee authored
      ppn_to_physical, logical_to_physical, physical_to_ppn & ppn_to_physical
      have now all been changed to accept/receive 64bit physical addresses in
      R8,R9 instead of a 32bit address in R5. However, where a phys addr is
      being provided as an input, they may currently only pay attention to the
      bottom 32 bits of the address.
      4fd2dd01
  4. 07 Nov, 2018 1 commit
    • Jeffrey Lee's avatar
      Attempt to tidy up substitute screen mode selection logic · d87c22a4
      Jeffrey Lee authored
      Detail:
        Over the years the OS's substitute screen mode selection logic has grown to be a tangled mess, and the logic it does implement isn't always very useful. Additionally, the kernel is structured in such a way that it can be hard for modules to override it.
        This set of changes aims to fix the many of the problems, by doing the following:
        - Moving all substitute mode selection logic out of the core VDU driver code and into a Service_ModeTranslation handler. This means you now only have one place in the kernel to look instead of several, and modules can override the behaviour by claiming/blocking the service call as appropriate.
        - Moving handling of the built-in VIDC lists out of the core VDU driver code and into a Service_ModeExtension handler. This means programs can now inspect these VIDC lists by issuing the right service call (although you are essentially limited to lists which the GraphicsV driver is OK with)
        - Moving *TV interlace & offset adjustment logic into the Service_ModeExtension handler, since they're legacy things which can be handled more cleanly for MDF/EDID (and the old code was poking memory the kernel didn't own)
        - Adding a Service_EnumerateScreenModes implementation, so that if you end up in the desktop with ScreenModes non-functional, the display manager at least has something useful to show you
        - Enhancing the handling of the built-in numbered modes so that they are now available in any colour depth; the Service_ModeExtension handler (and related handlers) treat the builtin VIDC lists as a set of mode timings, not a discrete set of modes
        - Substitute mode selection logic is a complete re-write. Instead of trying a handful of numbered fallback modes, it now tries:
          - Same mode but at higher colour depths
          - Same mode but at lower colour depths
          - Alternate resolutions (half-width mode with no double-pixel if original request was for double-pixel, and default resolution for monitor type)
        - Combined with the logic to allow the builtin VIDC lists to be used at any colour depth, this means that the kernel should now be able to find substitute modes for machines which lack support for <=8bpp modes (e.g. OMAP5)
        - Additionally the mode substitution code will attempt to retain as many properties of the originally requested mode as possible (eigen values, gap mode type, etc.)
        Other improvements:
        - The kernel now actually vets the builtin VIDC lists instead of assuming that they'll work (which also means they'll have the correct ExtraBytes value, where applicable)
        - The kernel now uses GraphicsV 19 (VetMode2) to vet the mode during the mode switch process, using the result to detect where the framebuffer will be placed. This allows for GraphicsV drivers to switch between DA 2 and external framestores on a per-mode basis.
        - The kernel now supports mode selectors which specify LineLength values which are larger than necessary; this will get translated to a suitable ExtraBytes control list item (+ combined with whatever padding the driver indicates is necessary via the VetMode2 result)
        File changes:
        - hdr/KernelWS - Reserve space for a VIDC list, since the Service_ModeExtension implementation typically can't use the built-in list as-is
        - s/Arthur3 - Issue Service_ModeFileChanged when the configured monitor type is changed, so that DisplayManager + friends are aware that the set of available modes has changed
        - s/GetAll - Fiddle with GETs a bit
        - s/MemMap2 - Extra LTORG
        - s/NewIRQs - Small routine to install/uninstall false VSync routine (previously from PushModeInfo, which wasn't really the appropriate place for it)
        - s/Utility - Hook up the extra service call handlers
        - s/vdu/legacymodes - New file containing the new service call implementations, and some related code
        - s/vdu/vdudecl - Move mode workspace definition here, from vdumodes
        - s/vdu/vdudriver - Remove assorted bits of mode substitution code. Plug in new bits for calling GraphicsV 19 during mode set, and deal with ExtraBytes/LineLength during PushModeInfo
        - s/vdu/vdumodes - Move some workspace definitions to s/vdu/vdudecl. Tweak how the builtin VIDC lists are stored.
        - s/vdu/vduswis - Rip out more mode substitution code. Issue Service_ModeFileChanged when monitor type is changed by OS_ScreenMode.
      Admin:
        Tested on Raspberry Pi 3, Iyonix, IGEPv5
      
      
      Version 6.14. Tagged as 'Kernel-6_14'
      d87c22a4
  5. 03 Sep, 2017 1 commit
    • Jeffrey Lee's avatar
      Fix global OS_SynchroniseCodeAreas. ARMop tweaks. · e6c4e3c2
      Jeffrey Lee authored
      Detail:
        s/ExtraSWIs - Fix global OS_SynchroniseCodeAreas using the wrong appspace size; would have resulted in appspace only being partially synced if some pages were mapped out due to lazy swapping
        s/ARMops, s/ExtraSWIs, s/MemMap2 - Simplify code by making DCache_LineLen / ICache_LineLen store the actual line length values on ARMv7+ instead of the log2 values. Optimise SMP I-cache invalidation by allowing it to do a global invalidate. Ensure all ARMv7+ range checks use LO instead of NE, to avoid any problems with mismatched I/D line lengths (can't be sure the op range was rounded to the larger of the two)
      Admin:
        Tested on iMX6
      
      
      Version 5.88, 4.129.2.5. Tagged as 'Kernel-5_88-4_129_2_5'
      e6c4e3c2
  6. 13 Dec, 2016 4 commits
    • Jeffrey Lee's avatar
      Implement support for cacheable pagetables · 65fa6a28
      Jeffrey Lee authored
      Detail:
        Modern ARMs (ARMv6+) introduce the possibility for the page table walk hardware to make use of the data cache(s) when performing memory accesses. This can significantly reduce the cost of a TLB miss on the system, and since the accesses are cache-coherent with the CPU it allows us to make the page tables cacheable for CPU (program) accesses also, improving the performance of page table manipulation by the OS.
        Even on ARMs where the page table walk can't use the data cache, it's been measured that page table manipulation operations can still benefit from placing the page tables in write-through or bufferable memory.
        So with that in mind, this set of changes updates the OS to allow cacheable/bufferable page tables to be used by the OS + MMU, using a system-appropriate cache policy.
        File changes:
        - hdr/KernelWS - Allocate workspace for storing the page flags that are to be used by the page tables
        - hdr/OSMem - Re-specify CP_CB_AlternativeDCache as having a different behaviour on ARMv6+ (inner write-through, outer write-back)
        - hdr/Options - Add CacheablePageTables option to allow switching back to non-cacheable page tables if necessary. Add SyncPageTables var which will be set {TRUE} if either the OS or the architecture requires a DSB after writing to a faulting page table entry.
        - s/ARM600, s/VMSAv6 - Add new SetTTBR & GetPageFlagsForCacheablePageTables functions. Update VMSAv6 for wider XCBTable (now 2 bytes per element)
        - s/ARMops - Update pre-ARMv7 MMU_Changing ARMops to drain the write buffer on entry if cacheable pagetables are in use (ARMv7+ already has this behaviour due to architectural requirements). For VMSAv6 Normal memory, change the way that the OS encodes the cache policy in the page table entries so that it's more compatible with the encoding used in the TTBR.
        - s/ChangeDyn - Update page table page flag handling to use PageTable_PageFlags. Make use of new PageTableSync macro.
        - s/Exceptions, s/AMBControl/memmap - Make use of new PageTableSync macro.
        - s/HAL - Update MMU initialisation sequence to make use of PageTable_PageFlags + SetTTBR
        - s/Kernel - Add PageTableSync macro, to be used after any write to a faulting page table entry
        - s/MemInfo - Update OS_Memory 0 page flag conversion. Update OS_Memory 24 to use new symbol for page table access permissions.
        - s/MemMap2 - Use PageTableSync. Add routines to enable/disable cacheable pagetables
        - s/NewReset - Enable cacheable pagetables once we're fully clear of the MMU initialision sequence (doing earlier would be trickier due to potential double-mapping)
      Admin:
        Tested on pretty much everything currently supported
        Delivers moderate performance benefits to page table ops on old systems (e.g. 10% faster), astronomical benefits on some new systems (up to 8x faster)
        Stats: https://www.riscosopen.org/forum/forums/3/topics/2728?page=2#posts-58015
      
      
      Version 5.71. Tagged as 'Kernel-5_71'
      65fa6a28
    • Jeffrey Lee's avatar
      Make MMU_Changing ARMops perform the sub-operations in a sensible order · 9a96263a
      Jeffrey Lee authored
      Detail:
        For a while we've known that the correct way of doing cache maintenance on ARMv6+ (e.g. when converting a page from cacheable to non-cacheable) is as follows:
        1. Write new page table entry
        2. Flush old entry from TLB
        3. Clean cache + drain write buffer
        The MMU_Changing ARMops (e.g. MMU_ChangingEntry) implement the last two items, but in the wrong order. This has caused the operations to fall out of favour and cease to be used, even in pre-ARMv6 code paths where the effects of improper cache/TLB management perhaps weren't as readily visible.
        This change re-specifies the relevant ARMops so that they perform their sub-operations in the correct order to make them useful on modern ARMs, updates the implementations, and updates the kernel to make use of the ops whereever relevant.
        File changes:
        - Docs/HAL/ARMop_API - Re-specify all the MMU_Changing ARMops to state that they are for use just after a page table entry has been changed (as opposed to before - e.g. 5.00 kernel behaviour). Re-specify the cacheable ones to state that the TLB invalidatation comes first.
        - s/ARM600, s/ChangeDyn, s/HAL, s/MemInfo, s/VMSAv6, s/AMBControl/memmap - Replace MMU_ChangingUncached + Cache_CleanInvalidate pairs with equivalent MMU_Changing op
        - s/ARMops - Update ARMop implementations to do everything in the correct order
        - s/MemMap2 - Update ARMop usage, and get rid of some lingering sledgehammer logic from ShuffleDoublyMappedRegionForGrow
      Admin:
        Tested on pretty much everything currently supported
      
      
      Version 5.70. Tagged as 'Kernel-5_70'
      9a96263a
    • Jeffrey Lee's avatar
      Place restrictions on the use of cacheable doubly-mapped DAs · 2704c756
      Jeffrey Lee authored
      Detail:
        The kernel has always allowed software to create cacheable doubly-mapped DAs, despite the fact that the VIVT caches used on ARMv5 and below would have no way of keeping both of the mappings coherent
        This change places restrictions the following restrictions on doubly-mapped areas, to ensure that cache settings which can't be supported by the cache architecture of the CPU can't be selected:
        * On ARMv6 and below, cacheable doubly-mapped areas aren't supported.
          * Although ARMv6 has VIPT data caches, it's also subject to page colouring constraints which would require us to force the DA size to be a multiple of 16k. So for now keep things simple and disallow cacheable doubly-mapped areas on ARMv6.
        * On ARMv7 and above, cacheable doubly-mapped areas are allowed, but only if they are marked non-executable
          * The blocker to allowing executable cacheable doubly-mapped areas are the VIPT instruction caches; OS_SynchroniseCodeAreas (or callers of it) would need to know that a doubly-mapped area is in use so that they can flush both mappings from the I-cache. Although some chips do have PIPT instruction caches, again it isn't really worth supporting executable cacheable doubly-mapped areas at the moment.
        These changes also allow us to get rid of the expensive 'sledgehammer' logic when dealing with doubly-mapped areas
        File changes:
        - s/ARM600, s/VMSAv6 - Remove the sledgehammer logic, only perform cache/TLB maintenance for the required areas
        - s/ChangeDyn - Implement the required checks
        - s/MemMap2 - Move some cache maintenance logic into RemoveCacheabilityR0ByMinusR2, which previously would have had to be performed by the caller due to the sledgehammer paranoia
      Admin:
        Cacheable doubly-mapped DAs tested on iMx6 (tried making screen memory write-through cacheable; decent performance gain seen)
        Note OS_Memory 0 "make temporarily uncacheable" doesn't work on doubly-mapped areas, so cacheable doubly-mapped areas are not yet safe for general DMA
      
      
      Version 5.69. Tagged as 'Kernel-5_69'
      2704c756
    • Jeffrey Lee's avatar
      Make s/ChangeDyn slightly more readable by splitting some routines out into a separate file · 4a6150dc
      Jeffrey Lee authored
      Detail:
        s/MemMap2 - New file containing assorted low-level memory mapping routines taken from s/ChangeDyn. N.B. There's no special significance to this being named "MemMap2", it's just a name that stuck due to some earlier (abandoned) changes which added a file named "MemMap".
        s/ChangeDyn - Remove the routines/chunks of code that were moved to s/MemMap2. Also some duplicate code removal (Regular DA grow code and DoTheGrowNotSpecified are now rely on the new DoTheGrowCommon routine for doing the actual grow)
        s/GetAll - GET s/MemMap2 at an appropriate time
      Admin:
        Tested on pretty much everything currently supported
      
      
      Version 5.67. Tagged as 'Kernel-5_67'
      4a6150dc