1. 07 Aug, 2021 1 commit
  2. 28 Jul, 2021 4 commits
    • Jeffrey Lee's avatar
      Make OS_Memory 24 report Abortable DAs · b98ccef2
      Jeffrey Lee authored
      Version 6.58. Tagged as 'Kernel-6_58'
      b98ccef2
    • Jeffrey Lee's avatar
      Add AP 1 emulation for long descriptor page tables · f93d930d
      Jeffrey Lee authored
      The long descriptor page table format doesn't support RISC OS access
      privilege 1 (user RX, privileged RWX). Previously we were downgrading
      this to AP 0 (user RWX, privielged RWX), which obviously weakens the
      security of the memory. However now that we have an AbortTrap
      implementation, we can map the memory as "user none, privileged RWX" and
      provide user read support via AbortTrap's instruction decode & execute
      logic.
      
      There's no support for executing usermode code from the memory, but the
      compatibility issues caused by that are likely to be minimal.
      f93d930d
    • Jeffrey Lee's avatar
      Add OS_AbortTrap implementation · c199c178
      Jeffrey Lee authored
      This supports all the load/store instructions, including FPA & VFP/NEON.
      Most instructions are handled directly via the base version of the
      AbortTrap API that was first implemented in RISC OS Select. However, to
      properly cope with LDREX/STREX, and future support for prefetch aborts,
      the API has been extended to allow the kernel to request that a block of
      memory is mapped in with certain permissions. For LDREX/STREX the kernel
      will then rewind the PC so that the instruction can be retried directly.
      
      Test code in Dev/AbortTrap exists in order to allow checking of all
      major functionality, along with code for building the code in a
      softloadable module for easier/quicker testing.
      c199c178
    • Jeffrey Lee's avatar
      Allow RW/ZI sections to be used · 2b665896
      Jeffrey Lee authored
      * Instruct the linker to place any RW/ZI data sections in the last ~16MB
      of the memory map, starting from &ff000000 (with the current toolchain,
      giving it a fixed base address is much easier than giving it a variable
      base address)
      * The RW/ZI section is mapped as completely inaccessible to user mode
      * The initial content of the RW section is copied over shortly after MMU
      startup (in Continue_after_HALInit)
      * Since link's -bin option produces a file containing a copy of the
      (zero-initialised) ZI section, the kernel binary is now produced from a
      "binary with AIF header" AIF with the help of the new 'kstrip' tool.
      kstrip extracts just the RO and RW sections, ensuring the ROM doesn't
      contain a redundant block of zeros for the ZI section.
      
      This should make it easier to use C code & arbitrary libraries within
      the kernel, providing they're compiled with suitable settings (e.g.
      non-module, no FP, no stack checking, like HALs typically use)
      2b665896
  3. 28 Apr, 2021 8 commits
    • Jeffrey Lee's avatar
      Log -> phys conversion improvements · 46081bca
      Jeffrey Lee authored
      * RISCOS_LogToPhys upgraded to allow it to cope with all page types
      (added support for 64KB "large" pages and lazily-mapped pages)
      
      * Added OS_Memory 65, which calls through to RISCOS_LogToPhys, to allow
      regular software to do logical-to-physical conversions for all page
      types (other calls, like OS_Memory 0/64, typically only work with 4KB
      pages)
      
      * LoadAndDecodeL2Entry updated to always return a page/entry size, like
      LoadAndDecodeL1Entry
      
      Version 6.56. Tagged as 'Kernel-6_56'
      46081bca
    • Jeffrey Lee's avatar
      Support runtime selection of pagetable format · ba993cb5
      Jeffrey Lee authored
      Runtime selection between long descriptor and short descriptor page
      table format is now possible (with the decision based on whether the HAL
      registers any high RAM or not). The main source changes are as follows:
      
      * LongDesc and ShortDesc switches are in hdr.Options to control what
      kernel variant is built
      * PTOp and PTWhich macros introduced in hdr.ARMops to allow for
      invocation of functions / code blocks which are specific to the page
      table format. If the kernel is being built with only one page table
      format enabled, PTOp is just a BL instruction, ensuring there's no
      performance loss compared to the old code.
      * _LongDesc and _ShortDesc suffixes added to various function names, to
      allow both versions of the function to be included at once if runtime
      selection is enabled
      * Most of the kernel / MMU initialisation code in s.HAL is now encased
      in a big WHILE loop, allowing it to be duplicated if runtime switching
      is enabled (easier than adding dynamic branches all over the place, and
      only costs a few KB of ROM/RAM)
      * Some more functions (notably AccessPhysicalAddress,
      ReleasePhysicalAddress, and MapInIO) have been moved to s.ShortDesc /
      s.LongDesc since they were already 90% specific to page table format
      ba993cb5
    • Jeffrey Lee's avatar
      Remove 1MB bodge from LongDesc LoadAndDecodeL1Entry · ce95d42e
      Jeffrey Lee authored
      LoadAndDecodeL1Entry will now always return the size/alignment of the
      entry. This allows ConstructCAMfromPageTables to walk over a 2MB long
      descriptor page table pointer in one go, instead of splitting it into
      two 1MB chunks (as if short descriptor page tables were in use) and
      calling LoadAndDecodeL1Entry twice. This has allowed the 1MB result
      alignment bodge to be removed from the LongDesc version of
      LoadAndDecodeL1Entry.
      ce95d42e
    • Jeffrey Lee's avatar
      Update OS_Memory 19 to understand non-DMAable memory · 235668bc
      Jeffrey Lee authored
      If the HAL has flagged a chunk of RAM as non-DMAable, OS_Memory 19
      (DMAPrep) will now indicate that DMA to/from that region should use a
      bounce buffer.
      235668bc
    • Jeffrey Lee's avatar
      Extend OS_Memory 19 for 64bit phys addresses · b53b73cd
      Jeffrey Lee authored
      Bit 11 of R0 can be used to indicate that the callback functions use
      64bit physical addresses instead of 32bit ones.
      b53b73cd
    • Jeffrey Lee's avatar
      Add OS_Memory 64, to supersede OS_Memory 0 · d5e91a02
      Jeffrey Lee authored
      OS_Memory 64 is an extended form of OS_Memory 0 which uses 64bit
      addresses instead of 32bit. Using 64bit physical addresses allows
      conversions to/from physical addresses to be performed on pages with
      large physical addresses. Using 64bit logical addresses provides us some
      future-proofing for an AArch64 version of RISC OS, with a 64bit logical
      memory map.
      d5e91a02
    • Jeffrey Lee's avatar
      Define OS_Memory 0 page block format · 7ddbbeed
      Jeffrey Lee authored
      Add to s.ChangeDyn a definition of the OS_Memory 0 page block format,
      and update all relevant code to use those definitions instead of
      hardcoded offsets.
      7ddbbeed
    • Jeffrey Lee's avatar
      Support RAM banks with high physical addresses · df4efb68
      Jeffrey Lee authored
      This changes PhysRamTable to store the address of each RAM bank in terms
      of (4KB) pages instead of bytes, effectively allowing it to support a 44
      bit physical address space. This means that (when the long descriptor
      page table format is used) the OS can now make use of memory located
      outside the lower 4GB of the physical address space. However some
      public APIs still need extending to allow for all operations to be
      supported on high RAM (e.g. OS_Memory logical to physical address
      lookups)
      
      OS_Memory 12 (RecommendPage) has been extended to allow R4-R7 to be used
      to specify a (64bit) physical address range which the recommended pages
      must lie within. For backwards compatibility this defaults to 0-4GB.
      df4efb68
  4. 17 Mar, 2021 2 commits
    • Jeffrey Lee's avatar
      Remove CAM size limit · 79bc3343
      Jeffrey Lee authored
      Previously the CAM sat inside a fixed 16MB window, restricting it to
      storing the details of 1 million pages, i.e. 4GB of RAM. Shuffle things
      around a bit to allow this restriction to be removed: the CAM is now
      located just above the IO region, and the CAM start address /
      IO top will calculated appropriately during kernel init. This change
      paves the way for us to support machines with over 4GB of RAM.
      
      FixedAreasTable has also been removed, since it's no longer really
      necessary (DAs can only be created between the top of application space
      and the bottom of the used IO space, and it's been a long time since
      we've had any fixed bits in the middle of there)
      79bc3343
    • Jeffrey Lee's avatar
      Initial long descriptor support · b51b5540
      Jeffrey Lee authored
      This adds initial support for the "long descriptor" MMU page table
      format, which allows the CPU to (flexibly) use a 40-bit physical address
      space.
      
      There are still some features that need fixing (e.g. RISCOS_MapInIO
      flags), and the OS doesn't yet support RAM above the 32bit limit, but
      this set of changes is enough to allow for working ROMs to be produced.
      
      Also, move MMUControlSoftCopy initialisation out of ClearWkspRAM, since
      it's unrelated to whether the HAL has cleared the RAM or not.
      b51b5540
  5. 13 Feb, 2021 5 commits
    • Jeffrey Lee's avatar
      [RISCOS_]AccessPhysicalAddress uses page flags · 7924aae2
      Jeffrey Lee authored
      Currently RISCOS_AccessPhysicalAddress allows the caller to specify the
      permissions/properties of the mapped memory by directly specifying some
      of the L1 page table entry flags. This will complicate things when
      adding support for more page table formats, so change it so that
      standard RISC OS page flags are used instead (like the alternate entry
      point, RISCOS_AccessPhysicalAddressUnchecked, already uses).
      
      Also, drop the "RISCOS_" prefix from RISCOS_AccessPhysicalAddress and
      RISCOS_ReleasePhysicalAddress, and remove the references to these
      routines from the HAL docs. These routines have never been exposed to
      the HAL, so renaming them and removing them from the docs should make
      their status clearer.
      
      Version 6.52. Tagged as 'Kernel-6_52'
      7924aae2
    • Jeffrey Lee's avatar
      Remove more direct page table access · 858949b6
      Jeffrey Lee authored
      RISCOS_LogToPhys and OS_Memory 20 (compatibility page) changed to use
      suitable subroutines for reading the page tables instead of accessing
      them directly.
      858949b6
    • Jeffrey Lee's avatar
      DecodeL1/L2Entry -> LoadAndDecodeL1/L2Entry · 846eee02
      Jeffrey Lee authored
      Change the DecodeL1/L2Entry routines so that instead of accepting a page
      table entry as input, they accept a (suitable-aligned) logical address
      and fetch the page table entry themselves. This helps insulate the
      calling code from the finer details of the page table format.
      846eee02
    • Jeffrey Lee's avatar
      Start moving page table code into s.ShortDesc · ca69793c
      Jeffrey Lee authored
      In preparation for the addition of long descriptor page table support,
      start moving low-level page table routines into their own file
      (s.ShortDesc) so that we can add a corresponding long descriptor
      implementation in the future.
      
      * logical_to_physical, MakePageTablesCacheable,
      MakePageTablesNonCacheable, AllocateBackingLevel2, AMB_movepagesin_L2PT,
      AMB_movecacheablepagesout_L2PT, AMB_moveuncacheablepagesout_L2PT
      routines, and PageNumToL2PT macros, all moved to s.ShortDesc with no
      changes.
      * Add new UpdateL1PTForPageReplacement routine (by splitting some code
      out of s.ChangeDyn)
      ca69793c
    • Jeffrey Lee's avatar
      Prepare logical_to_physical for 64bit phys addrs · 4fd2dd01
      Jeffrey Lee authored
      ppn_to_physical, logical_to_physical, physical_to_ppn & ppn_to_physical
      have now all been changed to accept/receive 64bit physical addresses in
      R8,R9 instead of a 32bit address in R5. However, where a phys addr is
      being provided as an input, they may currently only pay attention to the
      bottom 32 bits of the address.
      4fd2dd01
  6. 11 Jul, 2020 1 commit
    • Jeffrey Lee's avatar
      Fix OS_Memory 7 for discontiguous RAM · fb127e47
      Jeffrey Lee authored
      The current OS_Memory 7 implementation uses an address range structure
      returned by HAL_PhysInfo to decide which part of the physical address
      arrangement table to overwrite with RAM information. I suspect the
      original intention was for OS_Memory to use this address range to avoid
      marking the VRAM as DRAM (HAL_PhysInfo is expected to fill in the VRAM
      itself). But by overwriting everything between the start and the end
      address, OS_Memory will also overwrite any non-RAM areas which are
      sandwiched between RAM banks, e.g. the VideoCore-owned RAM on Pi models
      with >1GB RAM. There's also the problem that the address range returned
      by the HAL is using 32bit addresses, so it won't work as-is for RAM
      located above the 4GB barrier.
      
      Fix these issues by reworking the routine so that it ignores the address
      range returned by the HAL and instead detects VRAM by checking the
      IsVRAM flag in the PhysRamTable entry. And for detecting if the ROM is
      running from RAM, instead of using the address range we can rely on the
      flag available via OS_ReadSysInfo 8 (i.e. HAL_PlatformInfo), like
      OS_Memory 8 does.
      
      Also add a simple BASIC program (Dev.PhysInfo) to allow easy checking of
      HAL & OS physical address arrangement tables.
      
      Version 6.41. Tagged as 'Kernel-6_41'
      fb127e47
  7. 19 Nov, 2019 1 commit
    • Jeffrey Lee's avatar
      Allow reservation of memory pages · 1f84ad9f
      Jeffrey Lee authored
      This change adds a new OS_Memory reason code, 23, for reserving memory
      without actually assigning it to a dynamic area. Other dynamic areas can
      still use the memory, but only the code that reserved it will be allowed
      to claim exclusive use over it (i.e. PageFlags_Unavailable).
      
      This is useful for systems such as the PCI heap, where physically
      contiguous memory is required, but the memory isn't needed all of the
      time. By reserving the pages, it allows other regular DAs to make use of
      the memory when the PCI heap is small. But when the PCI heap needs to
      grow, it guarantees that (if there's enough free memory in the system)
      the previously reserved pages can be allocated to the PCI heap.
      
      Notes:
      
      * Reservations are handled on an honour system; there's no checking that
      the program that reserved the memory is the one attempting to map it in.
      
      * For regular NeedsSpecificPages DAs, reserved pages can only be used if
      the special "RESV" R0 return value is used.
      
      * For PMP DAs, reserved pages can only be made Unavailable if the entry
      in the page block also specifies the Reserved page flag. The actual
      state of the Reserved flag can't be modified via PMP DA ops, the flag is
      only used to indicate the caller's permission/intent to make the page
      Unavailable.
      
      * If a PMP DA tries to make a Reserved page Unavailable without
      specifying the Reserved flag, the kernel will try to swap it out for a
      replacement page taken from the free pool (preserving the contents and
      generating Service_PagesUnsafe / Service_PagesSafe, as if another DA
      had claimed the page)
      
      Version 6.28. Tagged as 'Kernel-6_28'
      1f84ad9f
  8. 16 Aug, 2019 3 commits
    • Ben Avison's avatar
      Support supersection-mapped memory in OS_Memory 24 · bd294cf9
      Ben Avison authored
      To achieve this:
      * DecodeL1Entry and DecodeL2Entry return 64-bit physical addresses in
        r0 and r1, with additional return values shuffled up to r2 and r3
      * DecodeL1Entry now returns the section size, so callers can distinguish
        section- from supersection-mapped memory
      * PhysAddrToPageNo now accepts a 64-bit address (though since the physical
        RAM table is currently still all 32-bit, it will report any top-word-set
        addresses as being not in RAM)
      
      Version 6.22. Tagged as 'Kernel-6_22'
      bd294cf9
    • Ben Avison's avatar
      Support temporary mapping of IO above 4GB using supersections · 96913c1f
      Ben Avison authored
      Add a new reason code, OS_Memory 22, equivalent to OS_Memory 14, but
      accepting a 64-bit physical address in r1/r2. Current ARM architectures can
      only express 40-bit or 32-bit physical addresses in their page tables
      (depending on whether they feature the LPAE extension or not) so unlike
      OS_Memory 14, OS_Memory 22 can return an error if an invalid physical
      address has been supplied. OS_Memory 15 should still be used to release a
      temporary mapping, whether you claimed it using OS_Memory 14 or OS_Memory 22.
      
      The logical memory map has had to change to accommodate supersection mapping
      of the physical access window, which needs to be 16MB wide and aligned to a
      16MB boundary. This results in there being 16MB less logical address space
      available for dynamic areas on all platforms (sorry) and there is now a 1MB
      hole spare in the system address range (above IO).
      
      The internal function RISCOS_AccessPhysicalAddress has been changed to
      accept a 64-bit physical address. This function has been a candidate for
      adding to the kernel entry points from the HAL for a long time - enough that
      it features in the original HAL documentation - but has not been so added
      (at least not yet) so there are no API compatibility issues there.
      
      Requires RiscOS/Sources/Programmer/HdrSrc!2
      96913c1f
    • Ben Avison's avatar
      Support permanent mapping of IO above 4GB using supersections · 9024d1f6
      Ben Avison authored
      This is facilitated by two extended calls. From the HAL:
      * RISCOS_MapInIO64 allows the physical address to be specified as 64-bit
      
      From the OS:
      * OS_Memory 21 acts like OS_Memory 13, but takes a 64-bit physical address
      
      There is no need to extend RISCOS_LogToPhys, instead we change its return
      type to uint64_t. Any existing HALs will only read the a1 register, thereby
      narrowing the result to 32 bits, which is fine because all existing HALs
      only expected a 32-bit physical address space anyway.
      
      Internally, RISCOS_MapInIO has been rewritten to detect and use supersections
      for IO regions that end above 4GB. Areas that straddle the 4GB boundary should
      also work, although if you then search for a sub-area that doesn't, it won't
      find a match and will instead map it in again using vanilla sections - this is
      enough of an edge case that I don't think we need to worry about it too much.
      
      The rewrite also conveniently fixes a bug in the old code: if the area being
      mapped in went all the way up to physical address 0xFFFFFFFF (inclusive) then
      only the first megabyte of the area was actually mapped in due to a loop
      termination issue.
      
      Requires RiscOS/Sources/Programmer/HdrSrc!2
      9024d1f6
  9. 08 Jul, 2018 1 commit
    • Jeffrey Lee's avatar
      Fix OS_Memory 0 "make temporarily uncacheable" not reporting errors · c5569c81
      Jeffrey Lee authored
      Detail:
        s/MemInfo - The wrapper around OS_Memory 0 introduced in Kernel-5_35-4_79_2_311 was preserving the wrong PSR field on exit, causing any error generated by the core code to be lost.
      Admin:
        Tested on Iyonix
        Fixes *screensave saving mostly white pixels (address translation for "external" VRAM should have failed and caused ADFS to fall back to a bounce buffer)
        Is also likely to be the cause of https://www.riscosopen.org/forum/forums/5/topics/11713 (address translation should have failed for soft ROM)
      
      
      Version 6.10. Tagged as 'Kernel-6_10'
      c5569c81
  10. 07 Oct, 2017 1 commit
    • Jeffrey Lee's avatar
      Tweak handling of zero page compatibility page · 36062ff5
      Jeffrey Lee authored
      Detail:
        s/MemInfo, hdr/KernelWS - Rather than peeking L2PT to determine if the compatibility page is enabled, use a workspace var to track its state. This ensures we won't get confused if other software decides to map something of its own to &0.
        s/NewReset - Ensure the CompatibilityPageEnabled flag is initialised correctly
      Admin:
        Tested in Iyonix ROM softload
      
      
      Version 5.90. Tagged as 'Kernel-5_90'
      36062ff5
  11. 19 Aug, 2017 1 commit
    • Jeffrey Lee's avatar
      Add a compatibility page zero for high processor vectors / zero page relocation builds · ffac5791
      Jeffrey Lee authored
      Detail:
        When HiProcVecs is enabled, there will now be a read-only page located at &0 in order to ease compatibility with buggy software which reads from null pointers
        Although most of the page is zero-filled, the start of the page contains a few words which are invalid pointers, discouraging dereferencing them, and a warning message if the memory is interpreted as a string.
        On ARMv6+ the page is also made non-executable, to deal with branch-through-zero type situations
        OS_Memory 20 has been introduced as a way of determining whether the compatibility page is present, and also to enable/disable it
        File changes:
        - hdr/Options - Add CompatibilityPage option
        - hdr/OSMem - Declare OS_Memory reason code 20
        - hdr/KernelWS - When CompatibilityPage is enabled, make sure nothing else is located at &0
        - s/NewReset - Enable compatibility page just before Service_PostInit (try and keep zero-tolerance policy for null pointer dereferencing during ROM init)
        - s/MemInfo - OS_Memory 20 implementation. Add knowledge of the compatibility page to OS_Memory 16 and 24.
      Admin:
        Tested on BB-xM
      
      
      Version 5.87. Tagged as 'Kernel-5_87'
      ffac5791
  12. 12 Aug, 2017 1 commit
    • Jeffrey Lee's avatar
      Add OS_Memory 19, which is intended to replace the OS_Memory 0 "make... · b47fdbb1
      Jeffrey Lee authored
      Add OS_Memory 19, which is intended to replace the OS_Memory 0 "make uncacheable" feature, when used for DMA
      
      Detail:
        Making pages uncacheable to allow them to be used with DMA can be troublesome for a number of reasons:
        * Many processors ignore cache hits for non-cacheable pages, so to avoid breaking any IRQ handlers the page table manipulation + cache maintenance must be performed with IRQs disabled, impacting the IRQ latency of the system
        * Some processors don't support LDREX/STREX to non-cacheable pages
        * In SMP setups it may be necessary to temporarily park the other cores somewhere safe, or perform some other explicit synchronisation to make sure they all have consistent views of the cache/TLB
        The above issues are most likely to cause problems when the page is shared by multiple programs; a DMA operation which targets one part of a page could impact the programs which are using the other parts.
        To combat these problems, OS_Memory 19 is being introduced, which allows DMA cache coherency/address translation to be performed without altering the attributes of the pages.
        Files changed:
        - hdr/OSMem - Add definitions for OS_Memory 19
        - s/MemInfo - Add OS_Memory 19 implementation
      Admin:
        Tested on Raspberry Pi 3, iMx6
      
      
      Version 5.86, 4.129.2.3. Tagged as 'Kernel-5_86-4_129_2_3'
      b47fdbb1
  13. 13 Dec, 2016 3 commits
    • Jeffrey Lee's avatar
      Implement support for cacheable pagetables · 65fa6a28
      Jeffrey Lee authored
      Detail:
        Modern ARMs (ARMv6+) introduce the possibility for the page table walk hardware to make use of the data cache(s) when performing memory accesses. This can significantly reduce the cost of a TLB miss on the system, and since the accesses are cache-coherent with the CPU it allows us to make the page tables cacheable for CPU (program) accesses also, improving the performance of page table manipulation by the OS.
        Even on ARMs where the page table walk can't use the data cache, it's been measured that page table manipulation operations can still benefit from placing the page tables in write-through or bufferable memory.
        So with that in mind, this set of changes updates the OS to allow cacheable/bufferable page tables to be used by the OS + MMU, using a system-appropriate cache policy.
        File changes:
        - hdr/KernelWS - Allocate workspace for storing the page flags that are to be used by the page tables
        - hdr/OSMem - Re-specify CP_CB_AlternativeDCache as having a different behaviour on ARMv6+ (inner write-through, outer write-back)
        - hdr/Options - Add CacheablePageTables option to allow switching back to non-cacheable page tables if necessary. Add SyncPageTables var which will be set {TRUE} if either the OS or the architecture requires a DSB after writing to a faulting page table entry.
        - s/ARM600, s/VMSAv6 - Add new SetTTBR & GetPageFlagsForCacheablePageTables functions. Update VMSAv6 for wider XCBTable (now 2 bytes per element)
        - s/ARMops - Update pre-ARMv7 MMU_Changing ARMops to drain the write buffer on entry if cacheable pagetables are in use (ARMv7+ already has this behaviour due to architectural requirements). For VMSAv6 Normal memory, change the way that the OS encodes the cache policy in the page table entries so that it's more compatible with the encoding used in the TTBR.
        - s/ChangeDyn - Update page table page flag handling to use PageTable_PageFlags. Make use of new PageTableSync macro.
        - s/Exceptions, s/AMBControl/memmap - Make use of new PageTableSync macro.
        - s/HAL - Update MMU initialisation sequence to make use of PageTable_PageFlags + SetTTBR
        - s/Kernel - Add PageTableSync macro, to be used after any write to a faulting page table entry
        - s/MemInfo - Update OS_Memory 0 page flag conversion. Update OS_Memory 24 to use new symbol for page table access permissions.
        - s/MemMap2 - Use PageTableSync. Add routines to enable/disable cacheable pagetables
        - s/NewReset - Enable cacheable pagetables once we're fully clear of the MMU initialision sequence (doing earlier would be trickier due to potential double-mapping)
      Admin:
        Tested on pretty much everything currently supported
        Delivers moderate performance benefits to page table ops on old systems (e.g. 10% faster), astronomical benefits on some new systems (up to 8x faster)
        Stats: https://www.riscosopen.org/forum/forums/3/topics/2728?page=2#posts-58015
      
      
      Version 5.71. Tagged as 'Kernel-5_71'
      65fa6a28
    • Jeffrey Lee's avatar
      Make MMU_Changing ARMops perform the sub-operations in a sensible order · 9a96263a
      Jeffrey Lee authored
      Detail:
        For a while we've known that the correct way of doing cache maintenance on ARMv6+ (e.g. when converting a page from cacheable to non-cacheable) is as follows:
        1. Write new page table entry
        2. Flush old entry from TLB
        3. Clean cache + drain write buffer
        The MMU_Changing ARMops (e.g. MMU_ChangingEntry) implement the last two items, but in the wrong order. This has caused the operations to fall out of favour and cease to be used, even in pre-ARMv6 code paths where the effects of improper cache/TLB management perhaps weren't as readily visible.
        This change re-specifies the relevant ARMops so that they perform their sub-operations in the correct order to make them useful on modern ARMs, updates the implementations, and updates the kernel to make use of the ops whereever relevant.
        File changes:
        - Docs/HAL/ARMop_API - Re-specify all the MMU_Changing ARMops to state that they are for use just after a page table entry has been changed (as opposed to before - e.g. 5.00 kernel behaviour). Re-specify the cacheable ones to state that the TLB invalidatation comes first.
        - s/ARM600, s/ChangeDyn, s/HAL, s/MemInfo, s/VMSAv6, s/AMBControl/memmap - Replace MMU_ChangingUncached + Cache_CleanInvalidate pairs with equivalent MMU_Changing op
        - s/ARMops - Update ARMop implementations to do everything in the correct order
        - s/MemMap2 - Update ARMop usage, and get rid of some lingering sledgehammer logic from ShuffleDoublyMappedRegionForGrow
      Admin:
        Tested on pretty much everything currently supported
      
      
      Version 5.70. Tagged as 'Kernel-5_70'
      9a96263a
    • Jeffrey Lee's avatar
      Reimplement AMBControl ontop of the PMP system · cefb4815
      Jeffrey Lee authored
      Detail:
        With this set of changes, each AMB node is now the owner of a fake DANode which is linked to a PMP.
        From a user's perspective the behaviour of AMBControl is the same as before, but rewriting it to use PMPs internally offers the following (potential) benefits:
        * Reduction in the amount of code which messes with the CAM & page tables, simplifying future work/maintenance. Some of the AMB ops (grow, shrink) now just call through to OS_ChangeDynamicArea. However all of the old AMB routines were well-optimised, so to avoid a big performance hit for common operations not all of them have been removed (e.g. mapslot / mapsome). Maybe one day these optimal routines will be made available for use by regular PMP DAs.
        * Removal of the slow Service_MemoryMoved / Service_PagesSafe handlers that had to do page list fixup after the core kernel had reclaimed/moved pages. Since everything is a PMP, the kernel will now deal with this on behalf of AMB.
        * Removal of a couple of other slow code paths (e.g. Do_AMB_MakeUnsparse calls from OS_ChangeDynamicArea)
        * Potential for more flexible mapping of application space in future, e.g. sparse allocation of memory to the wimp slot
        * Simpler transition to an ASID-based task swapping scheme on ARMv6+?
        Other changes of note:
        * AMB_LazyMapIn switch has been fixed up to work correctly (i.e. turning it off now disables lazy task swapping and all associated code instead of producing a build error)
        * The DANode for the current app should be accessed via the GetAppSpaceDANode macro. This will either return the current AMB DANode, or AppSpaceDANode (if e.g. pre-Wimp). However be aware that AppSpaceDANode retains the legacy behaviour of having a base + size relative to &0, while the AMB DANodes (identifiable via the PMP flag) are sane and have their base + size relative to &8000.
        * Mostly-useless DebugAborts switch removed
        * AMBPhysBin (page number -> phys addr lookup table) removed. Didn't seem to give any tangible performance benefit, and was imposing hidden restrictions on memory usage (all phys RAM fragments in PhysRamTable must be multiple of 512k). And if it really was a good optimisation, surely it should have been applied to all areas of the kernel, not just AMB!
        Other potential future improvements:
        * Turn the fake DANodes into real dynamic areas, reducing the amount of special code needed in some places, but allow the DAs to be hidden from OS_DynamicArea 3 so that apps/users won't get too confused
        * Add a generic abort trapping system to PMPs/DAs (lazy task swapping abort handler is still a special case)
        File changes:
        - s/ARM600, s/VMSAv6, s/ExtraSWIs - Remove DebugAborts
        - s/ArthurSWIs - Remove AMB service call handler dispatch
        - s/ChangeDyn - AMB_LazyMapIn switch fixes. Add alternate internal entry points for some PMP ops to allow the DANode to be specified (used by AMB)
        - s/Exceptions - Remove DebugAborts, AMB_LazyMapIn switch fixes
        - s/Kernel - Define GetAppSpaceDANode macro, AMB_LazyMapIn switch fix
        - s/MemInfo - AMB_LazyMapIn switch fixes
        - s/AMBControl/AMB - Update GETs
        - s/AMBControl/Memory - Remove block size quantisation, AMB_BlockResize (page list blocks are now allocated by PMP code)
        - s/AMBControl/Options - Remove PhysBin definitions, AMBMIRegWords (moved to Workspace file), AMB_LimpidFreePool switch. Add AMB_Debug switch.
        - s/AMBControl/Workspace - Update AMBNode to contain an embedded DANode. Move AMBMIRegWords here from Options file.
        - s/AMBControl/allocate - Fake DA node initialisation
        - s/AMBControl/deallocate - Add debug output
        - s/AMBControl/growp, growshrink, mapslot, mapsome, shrinkp - Rewrite to use PMP ops where possible, add debug output
        - s/AMBControl/main - Remove PhysBin initialisation. Update the enumerate/mjs_info call.
        - s/AMBControl/memmap - Low-level memory mapping routines updated or rewritten as appropriate.
        - s/AMBControl/readinfo - Update to cope with DANode
        - s/AMBControl/service - Remove old service call handlers
        - s/AMBControl/handler - DA handler for responding to PMP calls from OS_ChangeDynamicArea; just calls through to growpages/shrinkpages as appropriate.
      Admin:
        Tested on pretty much everything currently supported
      
      
      Version 5.66. Tagged as 'Kernel-5_66'
      cefb4815
  14. 02 Aug, 2016 1 commit
    • Jeffrey Lee's avatar
      Add support for shareable pages and additional access privileges · 9cd4cbe4
      Jeffrey Lee authored
      Detail:
        This set of changes:
        * Refactors page table entry encoding/decoding so that it's (mostly) performed via functions in the MMU files (s.ARM600, s.VMSAv6) rather than on an ad-hoc basis as was the case previously
        * Page table entry encoding/decoding performed during ROM init is also handled via the MMU functions, which resolves some cases where the wrong cache policy was in use on ARMv6+
        * Adds basic support for shareable pages - on non-uniprocessor systems all pages will be marked as shareable (however, we are currently lacking ARMops which broadcast cache maintenance operations to other cores, so safe sharing of cacheable regions isn't possible yet)
        * Adds support for the VMSA XN flag and the "privileged ROM" access permission. These are exposed via RISC OS access privileges 4 and above, taking advantage of the fact that 4 bits have always been reserved for AP values but only 4 values were defined
        * Adds OS_Memory 17 and 18 to convert RWX-style access flags to and from RISC OS access privelege numbers; this allows us to make arbitrary changes to the mappings of AP values 4+ between different OS/hardware versions, and allows software to more easily cope with cases where the most precise AP isn't available (e.g. no XN on <=ARMv5)
        * Extends OS_Memory 24 (CheckMemoryAccess) to return executability information
        * Adds exported OSMem header containing definitions for OS_Memory and OS_DynamicArea
        File changes:
        - Makefile - export C and assembler versions of hdr/OSMem
        - Resources/UK/Messages - Add more text for OS_Memory errors
        - hdr/KernelWS - Correct comment regarding DCacheCleanAddress. Allocate workspace for MMU_PPLTrans and MMU_PPLAccess.
        - hdr/OSMem - New file containing exported OS_Memory and OS_DynamicArea constants, and public page flags
        - hdr/Options - Reduce scope of ARM6support to only cover builds which require ARMv3 support
        - s/AMBControl/Workspace - Clarify AMBNode_PPL usage
        - s/AMBControl/growp, mapslot, mapsome, memmap - Use AreaFlags_ instead of AP_
        - s/AMBControl/main, memmap - Use GetPTE instead of generating page table entry manually
        - s/ARM600 - Remove old coments relating to lack of stack. Update BangCam to use GetPTE. Update PPL tables, removing PPLTransL1 (L1 entries are now derived from L2 table on demand) and adding a separate table for ARM6. Implement the ARM600 versions of the Get*PTE ('get page table entry') and Decode*Entry functions
        - s/ARMops - Add Init_PCBTrans function to allow relevant MMU_PPLTrans/MMU_PCBTrans pointers to be set up during the pre-MMU stage of ROM init. Update ARM_Analyse to set up the pointers that are used post MMU init.
        - s/ChangeDyn - Move a bunch of flags to hdr/OSMem. Rename the AP_ dynamic area flags to AreaFlags_ to avoid name clashes and confusion with the page table AP_ values exported by Hdr:MEMM.ARM600/Hdr:MEMM.VMSAv6. Also generate the relevant flags for OS_Memory 24 so that it can refer to the fixed areas by their name instead of hardcoding the permissions.
        - s/GetAll - GET Hdr:OSMem
        - s/HAL - Change initial page table setup to use DA/page flags and GetPTE instead of building page table entries manually. Simplify AllocateL2PT by removing the requirement for the user to supply the access perimssions that will be used for the area; instead for ARM6 we just assume that cacheable memory is the norm and set L1_U for any L1 entry we create here.
        - s/Kernel - Add GetPTE macro (for easier integration of Get*PTE functions) and GenPPLAccess macro (for easy generation of OS_Memory 24 flags)
        - s/MemInfo - Fixup OS_Memory 0 to not fail on seeing non-executable pages. Implement OS_Memory 17 & 18. Tidy up some error generation. Make OS_Memory 13 use GetPTE. Extend OS_Memory 24 to return (non-) executability information, to use the named CMA_ constants generated by s/ChangeDyn, and to use the Decode*Entry functions when it's necessary to decode page table entries.
        - s/NewReset - Use AreaFlags_ instead of AP_
        - s/VMSAv6 - Remove old comments relating to lack of stack. Update BangCam to use GetPTE. Update PPL tables, removing PPLTransL1 (L1 entries are now derived from L2 table on demand) and adding a separate table for shareable pages. Implement the VMSAv6 versions of the Get*PTE and Decode*Entry functions.
      Admin:
        Tested on Raspberry Pi 1, Raspberry Pi 3, Iyonix, RPCEmu (ARM6 & ARM7), comparing before and after CAM and page table dumps to check for any unexpected differences
      
      
      Version 5.55. Tagged as 'Kernel-5_55'
      9cd4cbe4
  15. 30 Jun, 2016 1 commit
    • Jeffrey Lee's avatar
      Delete pre-HAL and 26bit code · 7d5bfc66
      Jeffrey Lee authored
      Detail:
        This change gets rid of the following switches from the source (picking appropriate code paths for a 32bit HAL build):
        * HAL
        * HAL26
        * HAL32
        * No26bitCode
        * No32bitCode
        * IncludeTestSrc
        * FixR9CorruptionInExtensionSWI
        Various old files have also been removed (POST code, Arc/STB keyboard drivers, etc.)
      Admin:
        Identical binary to previous revision for IOMD & Raspberry Pi builds
      
      
      Version 5.49. Tagged as 'Kernel-5_49'
      7d5bfc66
  16. 19 May, 2016 1 commit
    • Jeffrey Lee's avatar
      Add new OS_PlatformFeatures reason code for reading CPU features (inspired by... · 9944f0f8
      Jeffrey Lee authored
      Add new OS_PlatformFeatures reason code for reading CPU features (inspired by ARMv6+ CPUID scheme). Add OS_ReadSysInfo 8 flags for indicating the alignment mode the ROM was built with. Fix long-standing bug with OS_PlatformFeatures when an unknown reason code is used.
      
      Detail:
        s/CPUFeatures, hdr/OSMisc, hdr/KernelWS - Code and definitions for reading CPU features and reporting them via OS_PlatformFeatures 34. All the instruction set features which are exposed by the CPUID scheme and which are relevant to RISC OS are exposed, along with a few extra flags which we derive ourselves (e.g. things relating to < ARMv4, and some register usage restrictions in instructions). s/CPUFeatures is designed to be easily copyable into a future version of CallASWI without requiring any changes.
        s/ARMops - Read and cache CPU features during ARMop initialisation
        s/GetAll - GET new file
        s/Kernel - Hook up the CPU features code to OS_PlatformFeatures. Fix a long standing stack imbalance bug (fixed in RISC OS 3.8, but never merged back to our main branch) which meant that calling OS_PlatformFeatures with an invalid reason code would raise an error, even if it was the X form of the SWI that was called. Similar fix also applied to the unused service call code, along with a fix for the user's R1-R9 being corrupt (shuffled up one place) should an error have been generated.
        s/MemInfo - Extra LTORG needed to keep things happy
        s/Middle - Extend OS_ReadSysInfo 8 to include flags for indicating what memory alignment mode (if any) the OS relies upon. Together with OS_PlatformFeatures 34 this could e.g. be used by !CPUSetup to determine which options should be offered to the user.
      Admin:
        Tested on Raspberry Pi 1, 2, 3
      
      
      Version 5.35, 4.79.2.319. Tagged as 'Kernel-5_35-4_79_2_319'
      9944f0f8
  17. 27 Mar, 2016 1 commit
    • Jeffrey Lee's avatar
      Improve safety of OS_Memory 0 "make temporarily uncacheable" and *Cache off · 6eee32dd
      Jeffrey Lee authored
      Detail:
        s/MemInfo - Wrap OS_Memory 0 in some code which will temporarily claim the FIQ vector when making pages temporarily uncacheable, to avoid any issues caused by modern ARMs ignoring unexpected cache hits
        s/VMSAv6 - Claim FIQs when OS_MMUControl is asked to make a change to the SCTLR, to avoid similar issues on modern ARMs. Also make the stack temporarily uncacheable before disabling the cache, so that we don't run into any problems using the stack inbetween disabling the cache and completing the clean+invalidate.
      Admin:
        Tested on Pi 2B, 3B
        *Cache off now works reliably on Pi 2B, although there is sometimes a pause of a few seconds while things sort themselves out (USB?)
        *Cache off "works" on Pi 3B but everything will fall over soon afterwards due to the Cortex-A53 not supporting LDREX/STREX to non-cacheable pages (or when the page is effectively non-cacheable, i.e. cacheable page with cache disabled)
      
      
      Version 5.35, 4.79.2.311. Tagged as 'Kernel-5_35-4_79_2_311'
      6eee32dd
  18. 12 Mar, 2016 1 commit
    • Jeffrey Lee's avatar
      Fix crash when making SVC stack uncacheable. Fix poor Pi 3 memory benchmark performance · a941a778
      Jeffrey Lee authored
      Detail:
        s/MemInfo - To avoid cache coherency issues when the current SVC stack page is being made uncacheable, shift SP somewhere else by temporarily dropping into IRQ mode
        s/ARMops - Change default VMSAv6 cache policy to writeback, write allocate. Unlike other CPUs we've supported so far, Cortex-A53 suffers very badly from writes to read-allocate pages, with performance being roughly equivalent to writes to non-cacheable memory. Using a write (+read) allocate policy seems to be needed to get the expected performance, and may help boost other CPUs too.
      Admin:
        Tested on IGEPv5, Pi 3
      
      
      Version 5.35, 4.79.2.307. Tagged as 'Kernel-5_35-4_79_2_307'
      a941a778
  19. 10 Mar, 2016 1 commit
    • Jeffrey Lee's avatar
      Cache maintenance fixes · b0682acb
      Jeffrey Lee authored
      Detail:
        This set of changes tackles two main issues:
        * Before mapping out a cacheable page or making it uncacheable, the OS performs a cache clean+invalidate op. However this leaves a small window where data may be fetched back into the cache, either accidentally (dodgy interrupt handler) or via agressive prefetch (as allowed for by the architecture). This rogue data can then result in coherency issues once the pages are mapped out or made uncacheable a short time later.
          The fix for this is to make the page uncacheable before performing the cache maintenance (although this isn't ideal, as prior to ARMv7 it's implementation defined whether address-based cache maintenance ops affect uncacheable pages or not - and on ARM11 it seems that they don't, so for that CPU we currently force a full cache clean instead)
        * Modern ARMs generally ignore unexpected cache hits, so there's an interrupt hole in the current OS_Memory 0 "make temporarily uncacheable" implementation where the cache is being flushed after the page has been made uncacheable (consider the case of a page that's being used by an interrupt handler, but the page is being made uncacheable so it can also be used by DMA). As well as affecting ARMv7+ devices this was found to affect XScale (and ARM11, although untested for this issue, would have presumably suffered from the "can't clean uncacheable pages" limitation)
          The fix for this is to disable IRQs around the uncache sequence - however FIQs are currently not being dealt with, so there's still a potential issue there.
        File changes:
        - Docs/HAL/ARMop_API, hdr/KernelWS, hdr/OSMisc - Add new Cache_CleanInvalidateRange ARMop
        - s/ARM600, s/VMSAv6 - BangCam updated to make the page uncacheable prior to flushing the cache. Add GetTempUncache macro to help with calculating the page flags required for making pages uncacheable. Fix abort in OS_MMUControl on Raspberry Pi - MCR-based ISB was resetting ZeroPage pointer to 0
        - s/ARMops - Cache_CleanInvalidateRange implementations. PL310 MMU_ChangingEntry/MMU_ChangingEntries refactored to rely on Cache_CleanInvalidateRange_PL310, which should be a more optimal implementation of the cache cleaning code that was previously in MMU_ChangingEntry_PL310.
        - s/ChangeDyn - Rename FastCDA_UpFront to FastCDA_Bulk, since the cache maintenance is no longer performed upfront. CheckCacheabilityR0ByMinusR2 now becomes RemoveCacheabilityR0ByMinusR2. PMP LogOp implementation refactored quite a bit to perform cache/TLB maintenance after making page table changes instead of before. One flaw with this new implementation is that mapping out large areas of cacheable pages will result in multiple full cache cleans while the old implementation would have (generally) only performed one - a two-pass approach over the page list would be needed to solve this.
        - s/GetAll - Change file ordering so GetTempUncache macro is available earlier
        - s/HAL - ROM decompression changed to do full MMU_Changing instead of MMU_ChangingEntries, to make sure earlier cached data is truly gone from the cache. ClearPhysRAM changed to make page uncacheable before flushing cache.
        - s/MemInfo - OS_Memory 0 interrupt hole fix
        - s/AMBControl/memmap - AMB_movepagesout_L2PT now split into cacheable+non-cacheable variants. Sparse map out operation now does two passes through the page list so that they can all be made uncacheable prior to the cache flush + map out.
      Admin:
        Tested on StrongARM, XScale, ARM11, Cortex-A7, Cortex-A9, Cortex-A15, Cortex-A53
        Appears to fix the major issues plaguing SATA on IGEPv5
      
      
      Version 5.35, 4.79.2.306. Tagged as 'Kernel-5_35-4_79_2_306'
      b0682acb
  20. 01 Sep, 2015 1 commit
    • Jeffrey Lee's avatar
      Remove OS_Memory 10 and associated code · 6ee2f464
      Jeffrey Lee authored
      Detail:
        s/MemInfo - Remove OS_Memory 10 (free pool locking). Locking the free pool has never been a very nice thing to do, so now that there's no logical mapping of the free pool it seems like it's a good time to outlaw the behaviour altogether.
        s/ChangeDyn - No free pool locking means one less thing to check when claiming the OS_ChangeDynamicArea mutex.
        hdr/KernelWS - VRAMRescue_control workspace variable is no longer needed
      Admin:
        Tested on Pandaboard
      
      
      Version 5.35, 4.79.2.285. Tagged as 'Kernel-5_35-4_79_2_285'
      6ee2f464
  21. 31 Aug, 2015 1 commit
    • Jeffrey Lee's avatar
      Add initial support for "physical memory pools" · 54872d8c
      Jeffrey Lee authored
      Detail:
        This set of changes adds support for "physical memory pools" (aka PMPs), a new type of dynamic area which allow physical pages to be claimed/allocated without mapping them in to the logical address space. PMPs have full control over which physical pages they use (similar to DAs which request specific physical pages), and also have full control over the logical mapping of their pages (which pages go where, and per-page access/cacheability control).
        Currently the OS makes use of two PMPs: one for the free pool (which now has a logical size of zero - freeing up gigabytes of logical space), and one for the RAM disc (logical size of 1MB, allowing for a physical size limited only by the amount of free memory)
        Implementing these changes has required a number of other changes to be made:
        * The CAM has been expanded from 8 bytes per entry to 16 bytes per entry, in order to allow each RAM page to store information about its PMP association
        * The system heap has been expanded to 32MB in size (from just under 4MB), in order to allow it to be used to store PMP page lists (1 word needed per page, but PMP pages may not always have physical pages assigned to them - so to allow multiple large PMPs to exist we need more than just 1 word per RAM page)
        * The &FA000000-&FBFFFFFF area of fixed kernel workspace has been shuffled around to accomodate the larger CAM, and the system heap is now located just above the RMA.
        * SoftResets code stripped out (unlikely we'll ever want to fix and re-enable it)
        * A couple of FastCDA options are now permanently on
        * Internal page flags shuffled around a bit. PageFlags_Unavailable now publicly exposed so that PMP clients can lock/unlock pages at will.
        * When OS_ChangeDynamicArea is asked to grow or shrink the free pool, it now implicitly converts it into a shrink or grow of application space (which is what would happen anyway). This simplifies the implementation; during a grow, pages (or replacement pages) are always sourced from the free pool, and during a shrink pages are always sent to the free pool.
        File changes:
        - hdr/KernelWS - Extend DANode structure. Describe CAM format. Adjust kernel workspace.
        - hdr/OSRSI6, s/Middle - Add new item to expose the CAM format
        - hdr/Options - Remove SoftResets switch. Add some PMP switches.
        - s/ARM600, s/VMSAv6 - Updated for new CAM format. Note that although the CAM stores PMP information, BangCamUpdate currently doesn't deal with updating that data - it's the caller's responsibility to do so where appropriate.
        - s/ChangeDyn - Lots of changes to implement PMP support, and to cope with the new CAM format.
        - s/HAL - Updated to cope with new CAM format, and lack of logical mapping of free pool.
        - s/MemInfo - Updated to cope with new CAM format. OS_Memory 0 updated to cope with converting PPN to PA for pages which are mapped out. OS_Memory 24 updated to decode the access permissions on a per-page basis for PMPs, and fixed its HWM usage for sparse DAs.
        - s/NewReset - Soft reset code and unused AddCamEntries function removed. Updated to cope with new CAM format, PMP free pool, PMP RAMFS
        - s/AMBControl/allocate - Update comment (RMA hasn't been used for AMBControl nodes for a long time)
        - s/AMBControl/growp, s/AMBControl/memmap, s/AMBControl/shrinkp - Update for new CAM format + PMP free pool
        - s/vdu/vdudriver - Strip out soft reset code.
      Admin:
        Tested on Pandaboard
        This is just a first iteration of the PMP feature, with any luck future changes will improve functionality. This means APIs are subject to change as well.
      
      
      Version 5.35, 4.79.2.284. Tagged as 'Kernel-5_35-4_79_2_284'
      54872d8c