Commits · 69a258163d4d80511e7febf43ebeba39b48f4ae8 · RiscOS / Sources / Lib / UnicodeLib

05 Dec, 2008 1 commit

Fix bugs and inconsistencies in encoding handlers. · 69a25816

  Fix inconsistency in handling illegal byte sequences.
  Convert surrogate codepoints and U+FFFE, U+FFFF to U+FFFD.
  Also, a few extra mappings.
Detail:
  enc_utf8.c: 0x80 is a continuation byte. Map stray ones to U+FFFD.
              Reset the count of expected continuation bytes to 0 when
              encountering illegal byte sequences. Previously, if the character
              callback returned non-zero, this count would not be reset, thus
              leaving the codec in an inconsistent state. Additionally, we no
              longer consume the illegal continuation byte: instead, we process
              it as a start byte next time round.
  encoding.c: Do not load extension tables for ISO-8859-{1,2,9,10,15,16}
              If these are needed, it's probably best that different charset
              names are used rather than overloading 8859-n.
  iso2022.c:  Permit SS2/3 escape sequences for EUC encode/decode.
              Disable C1 chara...

69a25816

26 Aug, 2005 1 commit
- Added Latin10. · a3d2481a
  Kevin Bracey authored 19 years ago
```
Version 0.55. Tagged as 'Unicode-0_55'
```
  a3d2481a
25 Aug, 2005 1 commit

* Added support for ISO 6937:2001, and the variant with Euro used by DVB. · 4f101f04

Kevin Bracey authored 19 years ago

  (This isn't integrated with ISO 2022 processing though - it's standalone).
* Added a Dstroke -> Eth second-attempt conversion in various write routines,
  primarily for ISO 6937 -> Latin1 conversion (ISO 6937 unifies them).

Version 0.54. Tagged as 'Unicode-0_54'

4f101f04

01 Jul, 2004 1 commit

Build changes · c1d14222

Steve Revill authored 20 years ago

Detail:
  Builds on 32-bit machine even with 26-bit environment.
  Fixed c.encoding so that it builds with newer tools.
Admin:
  Works in Baseline 500 build.

Version 0.53. Tagged as 'Unicode-0_53'

c1d14222

05 Mar, 2004 1 commit

Change merged from Pace repository: · 37b69d9e

Steve Revill authored 21 years ago

> Summary:
>   Merged changes from branch tree
>   Reversed previous change
> Detail:
>
> * Merged a few changes/fixes from the Unicode library in
>  branch's tree.
>
> * Reversed Steve's change from version 0.50. The change wasn't
>  necessary, and with the changed definition of NOT_USED in this
>  version, it compiles fine with cc 5.45.
>
> * Small comment change in unix.c. It now states that the file
>  isn't equivalent to any in the branch tree.
>
> Admin:
>   Built and briefly tested using TextConv utility on Risc PC.

Version 0.52. Tagged as 'Unicode-0_52'

37b69d9e

23 Jul, 2002 1 commit

Fixed to build with the latest cc (5.54) compiler. · 098e3087

Steve Revill authored 22 years ago

Detail:
  This version now builds with cc-5_45. Note: it has not been verified as
  actually functioning correctly.
Admin:
  Tested in DSL Baseline build.

Version 0.50. Tagged as 'Unicode-0_50'

098e3087

10 Jun, 2002 2 commits

Removed some warnings on unused variables. · 4671c87b

Stewart Brodie authored 22 years ago

  Fixed a comparison of a plain char (signedness issue)
Admin:
  These were from NCBrowser 5.28 too - but got forgot in the last checkin :-(
  I've not tried using this library.


Version 0.48. Tagged as 'Unicode-0_48'

4671c87b

Merge of bug fixes from NCBrowser tree. · 0524cabb

Stewart Brodie authored 22 years ago

Detail:
  Buffer overrun fixed; some buffer counting problems fixed too.  There is
    now helpful initialisation and tidyup routines you can call too (called
    encoding_initialise and encoding_tidyup)
Admin:
  I've built this with cc 5.45 in basic build environment - it built OK.
  This source code now matches that in NCBrowser 5.28.


Version 0.47. Tagged as 'Unicode-0_47'

0524cabb

13 Oct, 2000 1 commit

More syncronisation with Unicode lib in branched tree · 4e5abb29

John Beranek authored 24 years ago

Detail:
  Added some changes from Unicode lib in branched tree.  All basically
   type changes.  This appears to be because other compilers are
   more picky about types than armcc.

Admin:
  Will add 0.46 VersionNum file into branched tree, and all will be
   syncronised fully.


Version 0.46. Tagged as 'Unicode-0_46'

4e5abb29

05 Oct, 2000 1 commit

John Beranek authored 24 years ago

Detail:
  Copyright messages changed from E-14 to Pace throughout, filename
   placed at top of file throughout, instead of in just some files.

  Merged branch's fixes into our code base, plus made it possible to
   get nice debug output in branched tree, and vfprintf() to stderr in
   RISC OS tree.  Exactly same source used in branched tree now (apart
   from OS specific files riscos.c and unix.c moving into layers
   directory structure).

Admin:
  Built for branched, both Unix and RISC OS.
  Built in RISC OS tree, and compiled into TextConv.


Version 0.45. Tagged as 'Unicode-0_45'

b5fafb8f

16 Sep, 1999 1 commit
- Typo in enc_scsu.c corrected. · af544d8c
  Kevin Bracey authored 25 years ago
```
ISO 8859-8 is now ISO-IR 198 (05/14).

Version 0.43. Tagged as 'Unicode-0_43'
```
  af544d8c
14 Sep, 1999 1 commit

Improved SCSU "to lock or not to lock" learning. · a78a9156

Kevin Bracey authored 25 years ago

Improved handling of SIP ideographs.
Added ISO-8859-11 (csISOLatinThai).
Renamed Latin13 to Latin7.

Version 0.42. Tagged as 'Unicode-0_42'

a78a9156

13 Sep, 1999 1 commit

SCSU encoder made aware of SIP (Supplementary Plane for Unified CJK Ideographs). · 3786b3f0

Kevin Bracey authored 25 years ago

UTF-8 encoder handles out-of-space conditions correctly.
ISO 2022 encoder/decoder doesn't try to load table 7E (the null table).
encoding_new() does identify a null MIME string with auto-detect Japanese.
UnicodeData 3.0.0 imported.

Version 0.41. Tagged as 'Unicode-0_41'

3786b3f0

04 Aug, 1999 1 commit

Added Windows-1254. · 47a736c9

Kevin Bracey authored 25 years ago

Changed default language of Latin-5 (ISO 8859-9) from English to Turkish.

Version 0.40. Tagged as 'Unicode-0_40'

47a736c9

26 Mar, 1999 1 commit

Added new header file iso3166.h with list of country codes · 24f66e1b

Simon Middleton authored 25 years ago

Modified encoding.c so that Chinese encodings use the correct
country code as a secondary tag to the language code so that
we can distinguish Chinese Simplified and Traditional.

Version 0.39. Tagged as 'Unicode-0_39'

24f66e1b

23 Mar, 1999 1 commit

Fixed encoding_table_remove_unused() which totally failed to work correctly... · 297b22a7

Simon Middleton authored 26 years ago

Fixed encoding_table_remove_unused() which totally failed to work correctly would most likely crash as soon as it tried to free any tables.

Verified that fixed version does work within branched.

Version 0.38. Tagged as 'Unicode-0_38'

297b22a7

18 Mar, 1999 1 commit

Fixed encoding_new() so that it returns NULL if an encoding is chosen that... · 0d137580

Simon Middleton authored 26 years ago

Fixed encoding_new() so that it returns NULL if an encoding is chosen that does not have an encoding structure with it.

e.g. encoding 0 or AutoDetectJP.

Version 0.37. Tagged as 'Unicode-0_37'

0d137580

12 Mar, 1999 2 commits

Changed encoding_table_remove_unused() so that it takes a parameter giving the... · d137d7a7

Simon Middleton authored 26 years ago

Changed encoding_table_remove_unused() so that it takes a parameter giving the depth from which to start purging.

Fixed ISO2022 write code to free search tables.
Added unix.c for unix-targeted builds.
Updated cross-compile build.
Added unix-targeted build of library and textconv tool
in ccsolaris directory.

Version 0.36. Tagged as 'Unicode-0_36'

d137d7a7

Updated ISO 2022 handling to write ISO 2022-KR and ISO 2022-CN. Fixed various bugs. · 9e28e506

Kevin Bracey authored 26 years ago

x-Current encoding didn't work if International 1.50 wasn't loaded.
Adjusted various ISO 2022 escape sequence tables to change prioritisation.
ISO 2022 writer won't shift character set until required.

Version 0.35. Tagged as 'Unicode-0_35'

9e28e506

11 Mar, 1999 1 commit

Implemented SCSU and UTF-7. · 30550b96

Kevin Bracey authored 26 years ago

Added encoding_set_flags().
Proper handling of byte order marks in UTF-16 and UCS-4.
Fixed UTF-16 surrogate writing.
Adjusted various MIME charset identifiers.
Incorporated latest Unicode Character Database (2.1.8).
Added "current system alphabet" encoding.
Created "TextConv" command line character set conversion utility.

Version 0.34. Tagged as 'Unicode-0_34'

30550b96

24 Feb, 1999 2 commits

Added copyright messages to all source files and unified the header #define's. · a2254cad
Simon Middleton authored 26 years ago
```
Version 0.33. Not tagged
```
a2254cad

Created new file riscos.c for RISC OS specific functions. Rest of library... · ff925330

Simon Middleton authored 26 years ago

Created new file riscos.c for RISC OS specific functions. Rest of library should remain portable. Moved function to load a map file into that new file. Added #defines for directory separator and wild card characters and updated the various file names.

Version 0.33. Tagged as 'Unicode-0_33'

ff925330

23 Feb, 1999 2 commits
- Added Korean Johab encoding (untested). · 457fc5c6
  Kevin Bracey authored 26 years ago
```
Reinstated use of data->data relocations.

Version 0.32. Not tagged
```
  457fc5c6
- Mac Cyrillic, Ukrainian, Central European added. · 09c75eb8
  Kevin Bracey authored 26 years ago
```
DOS code page 866 (Russian) added.

Version 0.32. Tagged as 'Unicode-0_32'
```
  09c75eb8
05 Jan, 1999 1 commit
- Changed EUC JP to use ASCII rather than JIS Roman in G0 set. · 3b420100
  Simon Middleton authored 26 years ago
```
Version 0.30. Tagged as 'Unicode-0_30'
```
  3b420100
16 Nov, 1998 1 commit

Updated all the writers to ignore the NULL_UCS4 character (as had been... · 103112be

Simon Middleton authored 26 years ago

Updated all the writers to ignore the NULL_UCS4 character (as had been previously added to the iso2022_escapes case). Any new writers should flush any pending characters they may have at this point.

Also udpated enc_UCS4.c and utf8.c to turn all illegal characeters
(top bit set) into FFFD.

Version 0.28. Tagged as 'Unicode-0_28'

103112be

06 Nov, 1998 1 commit

Added new function encoding_default_mime_type() which given an encoding number... · 717fb443

Simon Middleton authored 26 years ago

Added new function encoding_default_mime_type() which given an encoding number returns the first mime type from the matching entry in the table.

Version 0.27. Tagged as 'Unicode-0_27'

717fb443

15 Sep, 1998 1 commit
- Added ISO 2022-JP-1. Faffed around with ISO 2022-JP-x table lists. · d2831a19
  Kevin Bracey authored 26 years ago
```
Version 0.19. Tagged as 'Unicode-0_19'
```
  d2831a19
10 Sep, 1998 1 commit

MIME type changes. ISO-IR-... form added for Latin3, 4, 5, 6, Cyrillic, · 11d6e5f1

Andrew Hodgkinson authored 26 years ago

Greek and Hebrew. ISO-8859-.. added for Celtic, which is renamed to
csISOLatin8 in the header file from csCeltic; csISOLatin9 added (ISO-IR-203);
csSami ISO-8859-15 MIME type form removed to not clash with csISOLatin9
(added to the header, defined as 4007 to follow on from csISOLatin8).

11d6e5f1

04 Sep, 1998 1 commit
- Added entry for ISO-IR-199 (Celtic) · 98917b91
  Kevin Bracey authored 26 years ago
```
Version 0.15. Tagged as 'Unicode-0_15'
```
  98917b91
06 Mar, 1998 1 commit
- Added Microsoft Cyrillic (CP1251) · 2f18bf6c
  Kevin Bracey authored 27 years ago
```
Version 0.14. Tagged as 'Unicode-0_14'
```
  2f18bf6c
05 Jan, 1998 1 commit

Fixed autojp state machine. It wasn't resetting 'state' to HAD_NONE after... · 407bccff

Simon Middleton authored 27 years ago

Fixed autojp state machine. It wasn't resetting 'state' to HAD_NONE after changing whatcode. So basically it was lucky it ever worked. Also rewrote the various range tests to only use one compare per case.

Changed the 'for_encoding' parameter to encoding_write() to an enumeration.
Added a new type of writing where if the character cannot be encoded then
the function returns -1 rather than writing a default character
Added the pseudo-charsets csAutodetectJP and csEUCorShiftJIS to the encoding
table so that they return the correct default language (ja).
Added function to remove unused encoding tables (must be called explicitly).
Fixed usage counting in iso2022 (I think).
When looking up encoding name try stripping 'x-' and 'X-' off the front i
can't find on first pass.

Version 0.12. Tagged as 'Unicode-0_12'

407bccff

18 Dec, 1997 1 commit
- Fixed iso2202_write_escapes() (as used by JIS encoding) so that it actually works. · 10298658
  Simon Middleton authored 27 years ago
```
It also now assumes that the first write encoding is already set up.

Version 0.10. Tagged as 'Unicode-0_10'
```
  10298658
10 Dec, 1997 1 commit
- Changed default language for Unicode encodings to be ANY rather than english. · e487a05f
  Simon Middleton authored 27 years ago
```
Version 0.09. Tagged as 'Unicode-0_09'
```
  e487a05f
08 Dec, 1997 1 commit

Fixed when SS1 or SS2 followed by a set change by disallowing... · 67178217

Simon Middleton authored 27 years ago

Fixed when SS1 or SS2 followed by a set change by disallowing controlcharacters after single shifts.

Made encoding_table_ptr and encoding_n_table_entries check for null tables.
moved 'Lm' type characters from marks to letters in mkunictype.

Version 0.08. Tagged as 'Unicode-0_08'

67178217

02 Dec, 1997 1 commit

Recreated acorn.c to hold new encoding cdAcornFuzzy. This writes an · 45d01bdf

Simon Middleton authored 27 years ago

Acorn Latin1 encoding using fuzzy mapping to get the greatest number
of displayable characters. Reads as Acorn.Latin1.

Version 0.07. Tagged as 'Unicode-0_07'

45d01bdf

21 Nov, 1997 1 commit

Added new file 'languages.h' with some ISO639 language codes. · fa3fa475

Simon Middleton authored 27 years ago

Added a default language field to each encoding (using above codes).
Added a max char size field to each encoding.
Tidied up some of the reencoders behaviour when output ptr NULL.
Fixed a load of charset numbers which were wrong.
New UTF8 function to skiop multiple characters in a string.
Fixed RISC OS build which was out of date.

Version 0.04. Tagged as 'Unicode-0_04'

fa3fa475

12 Nov, 1997 1 commit

Fixed encoding table so that modules builds will work. · 1c323496

Simon Middleton authored 27 years ago

Made all tables be on linked list to avoid static copies of pointers.
Removed redundant 8bit files.

Version 0.03. Tagged as 'Unicode-0_03'

1c323496

11 Nov, 1997 2 commits
- Removed used of external encoding_load_map_file(). Now references · 72e1de26
  Simon Middleton authored 27 years ago
```
Unicode:Encodings directly.

Version 0.02. Tagged as 'Unicode-0_02'
```
  72e1de26
- Initial version checked in · 36e3c744
  Simon Middleton authored 27 years ago
```
Version 0.01. Not tagged
```
  36e3c744