Commits · Kernel-6_64 · RiscOS / Sources / Kernel

28 Jul, 2021 3 commits

Add AP 1 emulation for long descriptor page tables · f93d930d

Jeffrey Lee authored 3 years ago

The long descriptor page table format doesn't support RISC OS access
privilege 1 (user RX, privileged RWX). Previously we were downgrading
this to AP 0 (user RWX, privielged RWX), which obviously weakens the
security of the memory. However now that we have an AbortTrap
implementation, we can map the memory as "user none, privileged RWX" and
provide user read support via AbortTrap's instruction decode & execute
logic.

There's no support for executing usermode code from the memory, but the
compatibility issues caused by that are likely to be minimal.

f93d930d

Add abortable DA support · fccd5e2f

Jeffrey Lee authored 3 years ago

This implementation should be compatible with RISCOS Ltd's
implementation.

fccd5e2f

Allow RW/ZI sections to be used · 2b665896

Jeffrey Lee authored 3 years ago

* Instruct the linker to place any RW/ZI data sections in the last ~16MB
of the memory map, starting from &ff000000 (with the current toolchain,
giving it a fixed base address is much easier than giving it a variable
base address)
* The RW/ZI section is mapped as completely inaccessible to user mode
* The initial content of the RW section is copied over shortly after MMU
startup (in Continue_after_HALInit)
* Since link's -bin option produces a file containing a copy of the
(zero-initialised) ZI section, the kernel binary is now produced from a
"binary with AIF header" AIF with the help of the new 'kstrip' tool.
kstrip extracts just the RO and RW sections, ensuring the ROM doesn't
contain a redundant block of zeros for the ZI section.

This should make it easier to use C code & arbitrary libraries within
the kernel, providing they're compiled with suitable settings (e.g.
non-module, no FP, no stack checking, like HALs typically use)

2b665896

28 Apr, 2021 5 commits

Support runtime selection of pagetable format · ba993cb5

Jeffrey Lee authored 5 years ago

Runtime selection between long descriptor and short descriptor page
table format is now possible (with the decision based on whether the HAL
registers any high RAM or not). The main source changes are as follows:

* LongDesc and ShortDesc switches are in hdr.Options to control what
kernel variant is built
* PTOp and PTWhich macros introduced in hdr.ARMops to allow for
invocation of functions / code blocks which are specific to the page
table format. If the kernel is being built with only one page table
format enabled, PTOp is just a BL instruction, ensuring there's no
performance loss compared to the old code.
* _LongDesc and _ShortDesc suffixes added to various function names, to
allow both versions of the function to be included at once if runtime
selection is enabled
* Most of the kernel / MMU initialisation code in s.HAL is now encased
in a big WHILE loop, allowing it to be duplicated if runtime switching
is enabled (easier than adding dynamic branches all over the place, and
only costs a few KB of ROM/RAM)
* Some more functions (notably AccessPhysicalAddress,
ReleasePhysicalAddress, and MapInIO) have been moved to s.ShortDesc /
s.LongDesc since they were already 90% specific to page table format

ba993cb5

Add Service_PagesUnsafe64 & PagesSafe64 · 15a7d5ee

Jeffrey Lee authored 5 years ago

These use a page block with a 64bit address fields (matching OS_Memory
64). The page list(s) contain the full list of pages involved in the
operation, unlike the 32bit PagesUnsafe / PagesSafe calls, which only
list pages which have 32bit addresses. The kernel issues the service
calls in the following order:

1. Service_PagesUnsafe64
2. Service_PagesUnsafe
3. Service_PagesSafe
4. Service_PagesSafe64

Since only one PagesUnsafe operation can occur at a time, a program
which supports both service calls can safely ignore the PagesUnsafe /
PagesSafe calls if a PagesUnsafe64 operation is in progress (the
PagesUnsafe call will only list a subset of the pages from the
PagesUnsafe64 call). The 32bit PagesUnsafe / PagesSafe calls will be
skipped if no 32bit pages are being replaced.

The addition of these calls means that NeedsSpecificPages DAs (and PMPs)
can now request pages which have large physical addresses.

Note that the page replacement logic now has the restriction that pages
which have 32bit physical addresses can only be replaced by other pages
which have 32bit physical addresses. This is necessary to ensure that
users of the old 32bit APIs see the page replacement take place. However
it does mean that programs will be unable to claim pages of low RAM
which are in use if there are not enough free low RAM pages in the free
pool.

A future optimisation would be to update the service calls so that they
don't list required pages which are in the free pool; if all the
required pages are in the free pool this would allow the service calls
(and FIQ claiming) to be skipped completely.

15a7d5ee

Add OS_Memory 64, to supersede OS_Memory 0 · d5e91a02

Jeffrey Lee authored 5 years ago

OS_Memory 64 is an extended form of OS_Memory 0 which uses 64bit
addresses instead of 32bit. Using 64bit physical addresses allows
conversions to/from physical addresses to be performed on pages with
large physical addresses. Using 64bit logical addresses provides us some
future-proofing for an AArch64 version of RISC OS, with a 64bit logical
memory map.

d5e91a02

Define OS_Memory 0 page block format · 7ddbbeed

Jeffrey Lee authored 5 years ago

Add to s.ChangeDyn a definition of the OS_Memory 0 page block format,
and update all relevant code to use those definitions instead of
hardcoded offsets.

7ddbbeed

Support RAM banks with high physical addresses · df4efb68

Jeffrey Lee authored 5 years ago

This changes PhysRamTable to store the address of each RAM bank in terms
of (4KB) pages instead of bytes, effectively allowing it to support a 44
bit physical address space. This means that (when the long descriptor
page table format is used) the OS can now make use of memory located
outside the lower 4GB of the physical address space. However some
public APIs still need extending to allow for all operations to be
supported on high RAM (e.g. OS_Memory logical to physical address
lookups)

OS_Memory 12 (RecommendPage) has been extended to allow R4-R7 to be used
to specify a (64bit) physical address range which the recommended pages
must lie within. For backwards compatibility this defaults to 0-4GB.

df4efb68

17 Mar, 2021 2 commits

Remove CAM size limit · 79bc3343

Jeffrey Lee authored 5 years ago

Previously the CAM sat inside a fixed 16MB window, restricting it to
storing the details of 1 million pages, i.e. 4GB of RAM. Shuffle things
around a bit to allow this restriction to be removed: the CAM is now
located just above the IO region, and the CAM start address /
IO top will calculated appropriately during kernel init. This change
paves the way for us to support machines with over 4GB of RAM.

FixedAreasTable has also been removed, since it's no longer really
necessary (DAs can only be created between the top of application space
and the bottom of the used IO space, and it's been a long time since
we've had any fixed bits in the middle of there)

79bc3343

Initial long descriptor support · b51b5540

Jeffrey Lee authored 5 years ago

This adds initial support for the "long descriptor" MMU page table
format, which allows the CPU to (flexibly) use a 40-bit physical address
space.

There are still some features that need fixing (e.g. RISCOS_MapInIO
flags), and the OS doesn't yet support RAM above the 32bit limit, but
this set of changes is enough to allow for working ROMs to be produced.

Also, move MMUControlSoftCopy initialisation out of ClearWkspRAM, since
it's unrelated to whether the HAL has cleared the RAM or not.

b51b5540

13 Feb, 2021 4 commits

[RISCOS_]AccessPhysicalAddress uses page flags · 7924aae2

Jeffrey Lee authored 5 years ago

Currently RISCOS_AccessPhysicalAddress allows the caller to specify the
permissions/properties of the mapped memory by directly specifying some
of the L1 page table entry flags. This will complicate things when
adding support for more page table formats, so change it so that
standard RISC OS page flags are used instead (like the alternate entry
point, RISCOS_AccessPhysicalAddressUnchecked, already uses).

Also, drop the "RISCOS_" prefix from RISCOS_AccessPhysicalAddress and
RISCOS_ReleasePhysicalAddress, and remove the references to these
routines from the HAL docs. These routines have never been exposed to
the HAL, so renaming them and removing them from the docs should make
their status clearer.

Version 6.52. Tagged as 'Kernel-6_52'

7924aae2

OS_FindMemMapEntries now uses logical_to_physical · b60d3a70

Jeffrey Lee authored 5 years ago

Reduce the number of routines which directly examine the page tables, by
changing OS_FindMemMapEntries to use logical_to_physical.

b60d3a70

Start moving page table code into s.ShortDesc · ca69793c

Jeffrey Lee authored 5 years ago

In preparation for the addition of long descriptor page table support,
start moving low-level page table routines into their own file
(s.ShortDesc) so that we can add a corresponding long descriptor
implementation in the future.

* logical_to_physical, MakePageTablesCacheable,
MakePageTablesNonCacheable, AllocateBackingLevel2, AMB_movepagesin_L2PT,
AMB_movecacheablepagesout_L2PT, AMB_moveuncacheablepagesout_L2PT
routines, and PageNumToL2PT macros, all moved to s.ShortDesc with no
changes.
* Add new UpdateL1PTForPageReplacement routine (by splitting some code
out of s.ChangeDyn)

ca69793c

Prepare logical_to_physical for 64bit phys addrs · 4fd2dd01

Jeffrey Lee authored 5 years ago

ppn_to_physical, logical_to_physical, physical_to_ppn & ppn_to_physical
have now all been changed to accept/receive 64bit physical addresses in
R8,R9 instead of a 32bit address in R5. However, where a phys addr is
being provided as an input, they may currently only pay attention to the
bottom 32 bits of the address.

4fd2dd01

16 Jan, 2021 1 commit

Make supervisor stack inaccessible to user mode · bbc7ad20

Jeffrey Lee authored 4 years ago

Previously the supervisor stack was read-only in user mode, but since
the supervisor stack is typically empty when the CPU is in user mode,
it's questionable whether any software actually makes use of this
facility.

To simplify support for the long descriptor page table format (which
doesn't support the user-RO + privileged-RW access mode), let's
try and remove usermode SVC stack access completely.

Tested on Raspberry Pi 4

Version 6.48. Tagged as 'Kernel-6_48'

bbc7ad20

23 Nov, 2020 1 commit

Increase RamFS limit to 2GB · a81fa868

Julie Stamp authored 4 years ago

Detail:
RamFS now supports disc sizes up to 2GB-4KB, so raise the dynamic area limit from 508MB.

Admin:
Tested with a disc size up to 928MB

Version 6.46. Tagged as 'Kernel-6_46'

a81fa868

19 Sep, 2020 1 commit

OS_DynamicArea 22 fixes · 88219988

Jeffrey Lee authored 4 years ago

Multiple fixes, mostly related to error handling.

1. Ensure R1 is initialised correctly when generating BadPageNumber
errors (labels 94 & 95). Generally this involves setting it to zero to
indicate that no call to LogOp_MapOut is required. Failing to do this
would typically result in a crash.
2. When branching back to the start of the loop after calling
GetNonReservedPage, ensure R0 is reset to zero. Failing to do this would
have a performance impact on LogOp_MapOut, but shouldn't be fatal.
3. In the main routine, postpone writing back DANode_Size until after
the call to physical_to_ppn (because we may decide to abort the op
and return an error without moving a page).
4. Fix stack offset when accessing PMPLogOp_GlobalTBLFlushNeeded.
Getting this wrong could potentially result in some TLB maintenance
being skipped when moving uncacheable pages.
5. Fix stack imbalance at label 94

Version 6.43. Tagged as 'Kernel-6_43'

88219988

01 Jul, 2020 6 commits

Add missing AMBControl appspace shrink check · 0634b535

Jeffrey Lee authored 4 years ago

Fix GrowFreePoolFromAppSpace (i.e. appspace shrink operation) to issue
UpCall_MemoryMoving / Service_Memory when attempting to shrink PMP-based
appspace (i.e. AMBControl nodes). This fixes (e.g.) BASIC getting stuck
in an abort loop if you try and use OS_ChangeDynamicArea to grow the
free pool.

Version 6.40. Tagged as 'Kernel-6_40'

0634b535

Fix PMP appspace size check · 5014117d

Jeffrey Lee authored 4 years ago

Fix AreaGrow to read appspace size correctly when appspace is a PMP
(i.e. an AMBControl node). Reading DANode_Size will only report the
amount of memory currently paged in (e.g. by lazy task swapping),
causing AreaGrow to underestimate how much it can potentially take from
the area.

5014117d

Fix GrowFreePool · 1a3c927f

Jeffrey Lee authored 4 years ago

Buggy since its introduction in Kernel-5_35-4_79_2_284, GrowFreePool was
attempting to grow the free pool by shrinking application space, an
operation which OS_ChangeDynamicArea doesn't support. Change it to grow
the free pool instead, and fix a couple of other issues that would have
caused it to work incorrectly (register corruption causing it to request
a size change of zero, and incorrect assumption that
OS_ChangeDynamicArea returns the amount unmoved, when really it returns
the amount moved)

1a3c927f

Fix combined freepool + appspace shrink · 76d04b25

Jeffrey Lee authored 4 years ago

When a DA tries to grow by more than the free pool size, the kernel
should try to take the necessary remaining amount from application
space. Historically this was handled as a combined "take from freepool
and appspace" operation, but with Kernel-5_35-4_79_2_284 this was
changed to use a nested call to OS_ChangeDynamicArea, so first appspace
is shrunk into the free pool and then the target DA is grown using just
the free pool.

However the code was foolishly trying to use ChangeDyn_AplSpace as the
argument to OS_ChangeDynamicArea, which that call doesn't recognise as a
valid DA number. Change it to use ChangeDyn_FreePool ("grow free pool
from appspace"), and also fix up a stack imbalance that would have
caused it to misbehave regardless of the outcome.

76d04b25

Fix shrinkables check in AreaGrow · 86fe0712

Jeffrey Lee authored 4 years ago

TryToShrinkShrinkables_Bytes expected both R1 and R2 to be byte counts,
but AreaGrow was calling with R1 as a byte count and R2 as a page count.
This would have caused it to request the first-found shrinkable to
shrink more than necessary, and also confuse the rest of AreaGrow when
the page-based R2 result of TryToShrinkShrinkables gets converted to a
byte count (when AreaGrow wants it as a page count)

86fe0712

Fix Service_Memory when shrinking appspace · 323e88c6

Jeffrey Lee authored 4 years ago

Update AreaShrink so that (when shrinking appspace) CheckAppSpace is
passed the change amount as a negative number, so that Service_Memory is
issued with the correct sign.

Fixes issue reported on the forums, where BASIC was getting confused
because appspace shrinks were being reported as if they were a grow
operation:

https://www.riscosopen.org/forum/forums/4/topics/15067

It looks like this bug was introduced in Kernel-5_35-4_79_2_284
(introduction of PMPs), where the logic for appspace shrinks (which must
be performed via a grow of the free pool or some other DA) were moved
out of AreaGrow and into AreaShrink (because appspace shrinks are now
internally treated as "shrink appspace into free pool")

323e88c6

27 Feb, 2020 1 commit

Fix phantom DA PostGrow errors · 8c2cae4a

Jeffrey Lee authored 5 years ago

Dynamic area PostGrow handlers aren't able to return errors, so the V
flag is likely undefined. Fix AreaGrow so that it ignores any V-set
condition returned by the call (especially since the CallPostGrow rapper
will have clobbered any error pointer).

Fixes issue reported on forums:
https://www.riscosopen.org/forum/forums/4/topics/14662

Version 6.34. Tagged as 'Kernel-6_34'

8c2cae4a

12 Feb, 2020 3 commits

Fixes for zero-size PMPs · 0830af41

Jeffrey Lee authored 5 years ago

OS_DynamicArea 21, 22 & 25 were using the value of the PMP page list
pointer (DANode_PMP) to determine whether the dynamic area is a PMP or
not. However, PMPs which have had their max physical size set to zero
will don't have the page list allocated, which will cause the test to
fail. Normally this won't matter (those calls can't do anything useful
when used on PMPs with zero max size), except for the edge case of where
the SWI has been given a zero-length page list as input. By checking the
value of DANode_PMP, this would result in the calls incorrectly
returning an error.

Fix this by having the code check the DA flags instead. Also, add a
check to OS_DynamicArea 23 (PMP resize), otherwise non-PMP DAs could end
up having page lists allocated for them.

0830af41

Fix stack imbalance in DA release · 3a26f20e

Jeffrey Lee authored 5 years ago

In OS_DynamicArea 2, a stack imbalance would occur if an error is
encountered while releasing the physical pages of a PMP (R1-R8 pushed,
but only R1-R7 pulled). Fix it, but also don't bother storing R1, since
it's never modified.

3a26f20e

PMP LogOp_MapOut fixes · a4ab6171

Jeffrey Lee authored 5 years ago

* Fix caching of page table entry flags (was never updating R9, so the
flags would be recalculated for every page)
* Fix use of flag in bottom bit of R6; if the flag was set, the
early-exit case for having made all the cacheable pages uncacheable will
never be hit, forcing it to loop through the full page list instead

a4ab6171

18 Jan, 2020 1 commit

Fix OS_DynamicArea 21 handling of MaxCamEntry · 5f7b9b37

Jeffrey Lee authored 5 years ago

OS_DynamicArea 21 was treating MaxCamEntry as if it was the exclusive
upper bound, when really it's the inclusive bound. The consequence of
this was that PMPs were unable to explicitly claim the highest-numbered
RAM page in the system.

Version 6.31. Tagged as 'Kernel-6_31'

5f7b9b37

24 Nov, 2019 1 commit

Add OS_DynamicArea 27+28, for supporting lots of RAM · 9224a6ca

Jeffrey Lee authored 5 years ago

OS_DynamicArea 27 is the same as OS_DynamicArea 5 ("return free
memory"), except the result is measured in pages instead of bytes,
allowing it to behave sensibly on machines with many gigabytes of RAM.

Similarly, OS_DynamicArea 28 is the same as OS_DynamicArea 7 (internal
DA enumeration call used by TaskManager), except the returned size
values are measured in pages instead of bytes. A flags word has also
been added to allow for more expansion in the future.

Hdr:OSMem now also contains some more definitions which external code
will find useful.

Version 6.29. Tagged as 'Kernel-6_29'

9224a6ca

19 Nov, 2019 1 commit

Allow reservation of memory pages · 1f84ad9f

Jeffrey Lee authored 5 years ago

This change adds a new OS_Memory reason code, 23, for reserving memory
without actually assigning it to a dynamic area. Other dynamic areas can
still use the memory, but only the code that reserved it will be allowed
to claim exclusive use over it (i.e. PageFlags_Unavailable).

This is useful for systems such as the PCI heap, where physically
contiguous memory is required, but the memory isn't needed all of the
time. By reserving the pages, it allows other regular DAs to make use of
the memory when the PCI heap is small. But when the PCI heap needs to
grow, it guarantees that (if there's enough free memory in the system)
the previously reserved pages can be allocated to the PCI heap.

Notes:

* Reservations are handled on an honour system; there's no checking that
the program that reserved the memory is the one attempting to map it in.

* For regular NeedsSpecificPages DAs, reserved pages can only be used if
the special "RESV" R0 return value is used.

* For PMP DAs, reserved pages can only be made Unavailable if the entry
in the page block also specifies the Reserved page flag. The actual
state of the Reserved flag can't be modified via PMP DA ops, the flag is
only used to indicate the caller's permission/intent to make the page
Unavailable.

* If a PMP DA tries to make a Reserved page Unavailable without
specifying the Reserved flag, the kernel will try to swap it out for a
replacement page taken from the free pool (preserving the contents and
generating Service_PagesUnsafe / Service_PagesSafe, as if another DA
had claimed the page)

Version 6.28. Tagged as 'Kernel-6_28'

1f84ad9f

30 Sep, 2019 1 commit

Allow runtime adjustment of AplWorkMaxSize · 0aeea07f

Jeffrey Lee authored 5 years ago

Detail:
This adds a new OS_DynamicArea reason code, 26, for adjusting
AplWorkMaxSize at runtime. This allows compatibility tools such as
Aemulor to adjust the limit without resorting to patching the kernel.
Any adjustment made to the value will affect the upper limit of
application space, and the lower limit of dynamic area placement.
Attempting to adjust beyond the compile-time upper/default limit, or
such that it will interfere with existing dynamic areas / wimpslots,
will result in an error.

Relevant forum thread:
https://www.riscosopen.org/forum/forums/11/topics/14734

Admin:
Tested on BB-xM, desktop active & inactive

Version 6.24. Tagged as 'Kernel-6_24'

0aeea07f

16 Aug, 2019 2 commits

Support supersection-mapped memory in OS_Memory 24 · bd294cf9

Ben Avison authored 5 years ago

To achieve this:
* DecodeL1Entry and DecodeL2Entry return 64-bit physical addresses in
  r0 and r1, with additional return values shuffled up to r2 and r3
* DecodeL1Entry now returns the section size, so callers can distinguish
  section- from supersection-mapped memory
* PhysAddrToPageNo now accepts a 64-bit address (though since the physical
  RAM table is currently still all 32-bit, it will report any top-word-set
  addresses as being not in RAM)

Version 6.22. Tagged as 'Kernel-6_22'

bd294cf9

Support temporary mapping of IO above 4GB using supersections · 96913c1f

Ben Avison authored 5 years ago

Add a new reason code, OS_Memory 22, equivalent to OS_Memory 14, but
accepting a 64-bit physical address in r1/r2. Current ARM architectures can
only express 40-bit or 32-bit physical addresses in their page tables
(depending on whether they feature the LPAE extension or not) so unlike
OS_Memory 14, OS_Memory 22 can return an error if an invalid physical
address has been supplied. OS_Memory 15 should still be used to release a
temporary mapping, whether you claimed it using OS_Memory 14 or OS_Memory 22.

The logical memory map has had to change to accommodate supersection mapping
of the physical access window, which needs to be 16MB wide and aligned to a
16MB boundary. This results in there being 16MB less logical address space
available for dynamic areas on all platforms (sorry) and there is now a 1MB
hole spare in the system address range (above IO).

The internal function RISCOS_AccessPhysicalAddress has been changed to
accept a 64-bit physical address. This function has been a candidate for
adding to the kernel entry points from the HAL for a long time - enough that
it features in the original HAL documentation - but has not been so added
(at least not yet) so there are no API compatibility issues there.

Requires RiscOS/Sources/Programmer/HdrSrc!2

96913c1f

30 Jun, 2018 1 commit

Simplify initial AplSpace claim · 526764e1

ROOL authored 6 years ago

Detail:
  As the application slot is now a normal dynamic area, there's no need to manipulate the CAM directly. Convert FudgeSomeAppSpace into a OS_ChangeDynamicArea SWI followed by memset().
  ChangeDyn.s: Offset by 32k to account for the -32k that dynamic area -1 has.
  NewReset.s: Delete FudgeSomeAppSpace and replace as above.
Admin:
  Submission from Timothy Baldwin.

Version 6.08. Tagged as 'Kernel-6_08'

526764e1

14 Apr, 2018 1 commit

Fix ability for PMPs to claim specific pages · 5e3e9d38

Jeffrey Lee authored 6 years ago

Detail:
s/ChangeDyn - Due to the way that some page flags map to the same bits as (different) DA flags, the Batcall that PMP_PreGrow makes in order to claim the requested page was getting confused and thinking that the special DMA PreGrow handler should be used instead of the DA-specific one (which in this case is a custom one responsible for claiming the right page). Modify PMP_PreGrow so that it only supplies DA flags to the Batcall, and patches in any custom page flags afterwards.
Also swap magic number for appropriate symbol in PMPGrowHandler.
Admin:
Tested on BB-xM
Fixes CAM corruption when a PMP claims a specific page, due to the PMP code and DA code disagreeing about which page should be used

Version 6.00. Tagged as 'Kernel-6_00'

5e3e9d38

11 Jan, 2017 1 commit

Fix inverted global vs. per-page cache flush logic in PMP LogOp_MapOut · b375500e

Jeffrey Lee authored 8 years ago

Detail:
  s/ChangeDyn - Set r6 bit 0 if the area is smaller than the cache range threshold, because that's what's checked for at lines 3077 and 3092
Admin:
  Tested on Raspberry Pi


Version 5.76. Tagged as 'Kernel-5_76'

b375500e

13 Dec, 2016 4 commits

Implement support for cacheable pagetables · 65fa6a28

Jeffrey Lee authored 8 years ago

Detail:
Modern ARMs (ARMv6+) introduce the possibility for the page table walk hardware to make use of the data cache(s) when performing memory accesses. This can significantly reduce the cost of a TLB miss on the system, and since the accesses are cache-coherent with the CPU it allows us to make the page tables cacheable for CPU (program) accesses also, improving the performance of page table manipulation by the OS.
Even on ARMs where the page table walk can't use the data cache, it's been measured that page table manipulation operations can still benefit from placing the page tables in write-through or bufferable memory.
So with that in mind, this set of changes updates the OS to allow cacheable/bufferable page tables to be used by the OS + MMU, using a system-appropriate cache policy.
File changes:
- hdr/KernelWS - Allocate workspace for storing the page flags that are to be used by the page tables
- hdr/OSMem - Re-specify CP_CB_AlternativeDCache as having a different behaviour on ARMv6+ (inner write-through, outer write-back)
- hdr/Options - Add CacheablePageTables option to allow switching back to non-cacheable page tables if necessary. Add SyncPageTables var which will be set {TRUE} if either the OS or the architecture requires a DSB after writing to a faulting page table entry.
- s/ARM600, s/VMSAv6 - Add new SetTTBR & GetPageFlagsForCacheablePageTables functions. Update VMSAv6 for wider XCBTable (now 2 bytes per element)
- s/ARMops - Update pre-ARMv7 MMU_Changing ARMops to drain the write buffer on entry if cacheable pagetables are in use (ARMv7+ already has this behaviour due to architectural requirements). For VMSAv6 Normal memory, change the way that the OS encodes the cache policy in the page table entries so that it's more compatible with the encoding used in the TTBR.
- s/ChangeDyn - Update page table page flag handling to use PageTable_PageFlags. Make use of new PageTableSync macro.
- s/Exceptions, s/AMBControl/memmap - Make use of new PageTableSync macro.
- s/HAL - Update MMU initialisation sequence to make use of PageTable_PageFlags + SetTTBR
- s/Kernel - Add PageTableSync macro, to be used after any write to a faulting page table entry
- s/MemInfo - Update OS_Memory 0 page flag conversion. Update OS_Memory 24 to use new symbol for page table access permissions.
- s/MemMap2 - Use PageTableSync. Add routines to enable/disable cacheable pagetables
- s/NewReset - Enable cacheable pagetables once we're fully clear of the MMU initialision sequence (doing earlier would be trickier due to potential double-mapping)
Admin:
Tested on pretty much everything currently supported
Delivers moderate performance benefits to page table ops on old systems (e.g. 10% faster), astronomical benefits on some new systems (up to 8x faster)
Stats: https://www.riscosopen.org/forum/forums/3/topics/2728?page=2#posts-58015

Version 5.71. Tagged as 'Kernel-5_71'

65fa6a28

Make MMU_Changing ARMops perform the sub-operations in a sensible order · 9a96263a

Jeffrey Lee authored 8 years ago

Detail:
For a while we've known that the correct way of doing cache maintenance on ARMv6+ (e.g. when converting a page from cacheable to non-cacheable) is as follows:
1. Write new page table entry
2. Flush old entry from TLB
3. Clean cache + drain write buffer
The MMU_Changing ARMops (e.g. MMU_ChangingEntry) implement the last two items, but in the wrong order. This has caused the operations to fall out of favour and cease to be used, even in pre-ARMv6 code paths where the effects of improper cache/TLB management perhaps weren't as readily visible.
This change re-specifies the relevant ARMops so that they perform their sub-operations in the correct order to make them useful on modern ARMs, updates the implementations, and updates the kernel to make use of the ops whereever relevant.
File changes:
- Docs/HAL/ARMop_API - Re-specify all the MMU_Changing ARMops to state that they are for use just after a page table entry has been changed (as opposed to before - e.g. 5.00 kernel behaviour). Re-specify the cacheable ones to state that the TLB invalidatation comes first.
- s/ARM600, s/ChangeDyn, s/HAL, s/MemInfo, s/VMSAv6, s/AMBControl/memmap - Replace MMU_ChangingUncached + Cache_CleanInvalidate pairs with equivalent MMU_Changing op
- s/ARMops - Update ARMop implementations to do everything in the correct order
- s/MemMap2 - Update ARMop usage, and get rid of some lingering sledgehammer logic from ShuffleDoublyMappedRegionForGrow
Admin:
Tested on pretty much everything currently supported

Version 5.70. Tagged as 'Kernel-5_70'

9a96263a

Place restrictions on the use of cacheable doubly-mapped DAs · 2704c756

Jeffrey Lee authored 8 years ago

Detail:
  The kernel has always allowed software to create cacheable doubly-mapped DAs, despite the fact that the VIVT caches used on ARMv5 and below would have no way of keeping both of the mappings coherent
  This change places restrictions the following restrictions on doubly-mapped areas, to ensure that cache settings which can't be supported by the cache architecture of the CPU can't be selected:
  * On ARMv6 and below, cacheable doubly-mapped areas aren't supported.
    * Although ARMv6 has VIPT data caches, it's also subject to page colouring constraints which would require us to force the DA size to be a multiple of 16k. So for now keep things simple and disallow cacheable doubly-mapped areas on ARMv6.
  * On ARMv7 and above, cacheable doubly-mapped areas are allowed, but only if they are marked non-executable
    * The blocker to allowing executable cacheable doubly-mapped areas are the VIPT instruction caches; OS_SynchroniseCodeAreas (or callers of it) would need to know that a doubly-mapped area is in use so that they can flush both mappings from the I-cache. Although some chips do have PIPT instruction caches, again it isn't really worth supporting executable cacheable doubly-mapped areas at the moment.
  These changes also allow us to get rid of the expensive 'sledgehammer' logic when dealing with doubly-mapped areas
  File changes:
  - s/ARM600, s/VMSAv6 - Remove the sledgehammer logic, only perform cache/TLB maintenance for the required areas
  - s/ChangeDyn - Implement the required checks
  - s/MemMap2 - Move some cache maintenance logic into RemoveCacheabilityR0ByMinusR2, which previously would have had to be performed by the caller due to the sledgehammer paranoia
Admin:
  Cacheable doubly-mapped DAs tested on iMx6 (tried making screen memory write-through cacheable; decent performance gain seen)
  Note OS_Memory 0 "make temporarily uncacheable" doesn't work on doubly-mapped areas, so cacheable doubly-mapped areas are not yet safe for general DMA


Version 5.69. Tagged as 'Kernel-5_69'

2704c756

Make s/ChangeDyn slightly more readable by splitting some routines out into a separate file · 4a6150dc

Jeffrey Lee authored 8 years ago

Detail:
s/MemMap2 - New file containing assorted low-level memory mapping routines taken from s/ChangeDyn. N.B. There's no special significance to this being named "MemMap2", it's just a name that stuck due to some earlier (abandoned) changes which added a file named "MemMap".
s/ChangeDyn - Remove the routines/chunks of code that were moved to s/MemMap2. Also some duplicate code removal (Regular DA grow code and DoTheGrowNotSpecified are now rely on the new DoTheGrowCommon routine for doing the actual grow)
s/GetAll - GET s/MemMap2 at an appropriate time
Admin:
Tested on pretty much everything currently supported

Version 5.67. Tagged as 'Kernel-5_67'

4a6150dc