Modify sed command line to be syntactically correct. It was inclined to ignore the '$<', thus expecting its input from the command line. Replaced with '< $'. Found needed when building multiple ROMs using DDE31c. sed is unchanged.
Hmm.. OK I see it should be a 2021 issue. will close this unless I can repeat using that thanks
The amu active here is 5.34 [30 Oct 2020] ... is that the one expected? I was surprised to see the issue but it cropped up ib 2 different machines on different builds. Its not been an issue before. Specifically occurred on builds checked out over the weekend
It looks syntactically correct as it is. It's worked in the last 450 nightly builds without issue, did you forget to update your amu?
Modify sed command line to be syntactically correct. It was inclined to ignore the '$<', thus expecting its input from the command line. Replaced with '< $'. Found needed when building multiple ROMs using DDE31c. sed is unchanged.
Save ~6.5k in the copy which ends up in ResourceFS, while carefully leaving them present in the source file.
This carbon offsets the space added to RiscOS/Sources/Apps/Draw!8 so I don't feel too guilty.
ROOL (308045b9) at 11 Feb 11:19
Trim comments from encodings
Save ~6.5k in the copy which ends up in ResourceFS, while carefully leaving them present in the source file.
This carbon offsets the space added to RiscOS/Sources/Apps/Draw!8 so I don't feel too guilty.
Agreed that reading the encoding with Font_ReadDefn every plot would be bad. I haven't looked at the Wimp code yet to see if caching would be an option. It could probably be managed sensibly as long as the cached data was updated every time the icon's font was changed. I don't think there's any other way the font encoding could change. But the validation string option is also attractive.
Another thing to consider is that the inverse of this behaviour is probably also desirable. That is, when running with a UTF-8 system alphabet, if an icon's font was specifically set to "\ELatin1" (or any other 8-bit encoding; in fact anything other than UTF-8), the app developer would reasonably expect the Wimp to handle rendering and caret movement correctly. Currently, Wimp symbol replacement and caret movement would be handled as per the system alphabet, not the icon's specific encoding.
So a validation string option perhaps should be along the lines of "icon contains UTF-8 text" if system alphabet isn't UTF-8, and "icon contains 8-bit text" if the system alphabet is UTF-8.
All the above withstanding, I think the provided patch to FM is desirable because it helps RO4/6 users, whose Wimps will be replacing Wimp symbols with no regard for UTF-8 text encoding, and who won't be able to softload the RO5 Wimp (I believe), whilst not adding a significant performance hit to RO5 users. It also serves RO5 users in the interim while the above Wimp changes are being made. If someone with write access could action the merge, that would be great.
This change to Font Manager fixes the problem of UTF8 strings being garbled in Wimp icons when the system alphabet isn't set to UTF8, but the encoding of the font set for the icon is UTF8.
Currently, when the system alphabet is not UTF8, even if an icon has been set to use a font opened with "\EUTF8", the Wimp still replaces all special wimp symbols in the &8x range with [code change to WimpSymbol font] + symbol + [code change back] (as long as the U+FFFD replacement character isn't defined in the font*1). This is undesirable if the icon contains UTF8-encoded text, since top-bit set bytes (in the &8x range) are part of valid sequences. This change causes the Font Manager to skip font change codes if they occur in the middle of a UTF8 sequence.
If applications want to use wimp symbols in UTF8 sequences, they should use the UTF8 sequences thereof, as detailed in the documentation.
Also, unrelated, fixes some debugging code Entry macros to EntryS, and a register recovery from the stack to use the FramLDR macro.
Would it make sense to have the Wimp check the encoding (from the font handle, from the icon) so it doesn't mess up the WimpSymbol substitution in the first place?
Yes, definitely. Either check the font handle to see what the encoding is, or (John-Mark Bell's suggestion), utilise a validation string "feature flag"
Agree that it looks like the only way to deduce UTF-8 is Font_ReadDefn with R3=FULL then pick off the encoding. I wouldn't want the Wimp thrashing that SWI every time the text in an icon needs plotting (eg. dragging a window over it), so either caching it or having the application developer signal it via a new Y validation would be best.
I'm not aware of the Wimp keeping any per-icon private flags like it does per-window, though I didn't look too hard, and anyway caching something may not be practical if the icon's font handle was changed the Wimp would need to track that. Maybe that says a new Y validation "Icon contains UTF-8 encoded text" is the way out, and the validation can be ignored if the system alphabet is UTF-8.
Yes, definitely. Either check the font handle to see what the encoding is, or (John-Mark Bell's suggestion), utilise a validation string "feature flag" (see here). Because really, the Wimp should be applying all its special UTF8 handling to writable icons set to use a font opened with \EUTF8, too. Caret handling, etc. At the moment, it decides everything based off the system alphabet, which isn't granular enough.
But this is a simple intermediate fix that immediately allows (especially East Asian) text to display without being garbled.
For reference, pushfontstring
is here, and you'd need to call into Font_ReadDefn to get the encoding that was asked for when you only know the font handle, I think.
This change to Font Manager fixes the problem of UTF8 strings being garbled in Wimp icons when the system alphabet isn't set to UTF8, but the encoding of the font set for the icon is UTF8.
Would it make sense to have the Wimp check the encoding (from the font handle, from the icon) so it doesn't mess up the WimpSymbol substitution in the first place? Routine pushfontstring
looks like it's where the business is done in the Wimp.
This change to Font Manager fixes the problem of UTF8 strings being garbled in Wimp icons when the system alphabet isn't set to UTF8, but the encoding of the font set for the icon is UTF8.
Currently, when the system alphabet is not UTF8, even if an icon has been set to use a font opened with "\EUTF8", the Wimp still replaces all special wimp symbols in the &8x range with [code change to WimpSymbol font] + symbol + [code change back] (as long as the U+FFFD replacement character isn't defined in the font*1). This is undesirable if the icon contains UTF8-encoded text, since top-bit set bytes (in the &8x range) are part of valid sequences. This change causes the Font Manager to skip font change codes if they occur in the middle of a UTF8 sequence.
If applications want to use wimp symbols in UTF8 sequences, they should use the UTF8 sequences thereof, as detailed in the documentation.
Also, unrelated, fixes some debugging code Entry macros to EntryS, and a register recovery from the stack to use the FramLDR macro.