1. 05 Dec, 2008 1 commit
    • Andrew Hodgkinson's avatar
      Fix bugs and inconsistencies in encoding handlers. · 69a25816
      Andrew Hodgkinson authored
        Fix inconsistency in handling illegal byte sequences.
        Convert surrogate codepoints and U+FFFE, U+FFFF to U+FFFD.
        Also, a few extra mappings.
      Detail:
        enc_utf8.c: 0x80 is a continuation byte. Map stray ones to U+FFFD.
                    Reset the count of expected continuation bytes to 0 when
                    encountering illegal byte sequences. Previously, if the character
                    callback returned non-zero, this count would not be reset, thus
                    leaving the codec in an inconsistent state. Additionally, we no
                    longer consume the illegal continuation byte: instead, we process
                    it as a start byte next time round.
        encoding.c: Do not load extension tables for ISO-8859-{1,2,9,10,15,16}
                    If these are needed, it's probably best that different charset
                    names are used rather than overloading 8859-n.
        iso2022.c:  Permit SS2/3 escape sequences for EUC encode/decode.
                    Disable C1 chara...
      69a25816
  2. 26 Aug, 2005 1 commit
  3. 25 Aug, 2005 1 commit
  4. 01 Jul, 2004 1 commit
    • Steve Revill's avatar
      Build changes · c1d14222
      Steve Revill authored
      Detail:
        Builds on 32-bit machine even with 26-bit environment.
        Fixed c.encoding so that it builds with newer tools.
      Admin:
        Works in Baseline 500 build.
      
      Version 0.53. Tagged as 'Unicode-0_53'
      c1d14222
  5. 05 Mar, 2004 1 commit
    • Steve Revill's avatar
      Change merged from Pace repository: · 37b69d9e
      Steve Revill authored
      > Summary:
      >   Merged changes from branch tree
      >   Reversed previous change
      > Detail:
      >
      > * Merged a few changes/fixes from the Unicode library in
      >  branch's tree.
      >
      > * Reversed Steve's change from version 0.50. The change wasn't
      >  necessary, and with the changed definition of NOT_USED in this
      >  version, it compiles fine with cc 5.45.
      >
      > * Small comment change in unix.c. It now states that the file
      >  isn't equivalent to any in the branch tree.
      >
      > Admin:
      >   Built and briefly tested using TextConv utility on Risc PC.
      
      Version 0.52. Tagged as 'Unicode-0_52'
      37b69d9e
  6. 23 Jul, 2002 1 commit
  7. 10 Jun, 2002 2 commits
    • Stewart Brodie's avatar
      Removed some warnings on unused variables. · 4671c87b
      Stewart Brodie authored
        Fixed a comparison of a plain char (signedness issue)
      Admin:
        These were from NCBrowser 5.28 too - but got forgot in the last checkin :-(
        I've not tried using this library.
      
      
      Version 0.48. Tagged as 'Unicode-0_48'
      4671c87b
    • Stewart Brodie's avatar
      Merge of bug fixes from NCBrowser tree. · 0524cabb
      Stewart Brodie authored
      Detail:
        Buffer overrun fixed; some buffer counting problems fixed too.  There is
          now helpful initialisation and tidyup routines you can call too (called
          encoding_initialise and encoding_tidyup)
      Admin:
        I've built this with cc 5.45 in basic build environment - it built OK.
        This source code now matches that in NCBrowser 5.28.
      
      
      Version 0.47. Tagged as 'Unicode-0_47'
      0524cabb
  8. 13 Oct, 2000 1 commit
    • John Beranek's avatar
      More syncronisation with Unicode lib in branched tree · 4e5abb29
      John Beranek authored
      Detail:
        Added some changes from Unicode lib in branched tree.  All basically
         type changes.  This appears to be because other compilers are
         more picky about types than armcc.
      
      Admin:
        Will add 0.46 VersionNum file into branched tree, and all will be
         syncronised fully.
      
      
      Version 0.46. Tagged as 'Unicode-0_46'
      4e5abb29
  9. 05 Oct, 2000 1 commit
    • John Beranek's avatar
      Copyright message changes + changes from branch + Unified branched/non-branched builds · b5fafb8f
      John Beranek authored
      Detail:
        Copyright messages changed from E-14 to Pace throughout, filename
         placed at top of file throughout, instead of in just some files.
      
        Merged branch's fixes into our code base, plus made it possible to
         get nice debug output in branched tree, and vfprintf() to stderr in
         RISC OS tree.  Exactly same source used in branched tree now (apart
         from OS specific files riscos.c and unix.c moving into layers
         directory structure).
      
      Admin:
        Built for branched, both Unix and RISC OS.
        Built in RISC OS tree, and compiled into TextConv.
      
      
      Version 0.45. Tagged as 'Unicode-0_45'
      b5fafb8f
  10. 16 Sep, 1999 1 commit
  11. 14 Sep, 1999 1 commit
  12. 13 Sep, 1999 1 commit
  13. 04 Aug, 1999 1 commit
    • Kevin Bracey's avatar
      Added Windows-1254. · 47a736c9
      Kevin Bracey authored
      Changed default language of Latin-5 (ISO 8859-9) from English to Turkish.
      
      Version 0.40. Tagged as 'Unicode-0_40'
      47a736c9
  14. 26 Mar, 1999 1 commit
  15. 23 Mar, 1999 1 commit
  16. 18 Mar, 1999 1 commit
  17. 12 Mar, 1999 2 commits
  18. 11 Mar, 1999 1 commit
    • Kevin Bracey's avatar
      Implemented SCSU and UTF-7. · 30550b96
      Kevin Bracey authored
      Added encoding_set_flags().
      Proper handling of byte order marks in UTF-16 and UCS-4.
      Fixed UTF-16 surrogate writing.
      Adjusted various MIME charset identifiers.
      Incorporated latest Unicode Character Database (2.1.8).
      Added "current system alphabet" encoding.
      Created "TextConv" command line character set conversion utility.
      
      Version 0.34. Tagged as 'Unicode-0_34'
      30550b96
  19. 24 Feb, 1999 2 commits
  20. 23 Feb, 1999 2 commits
  21. 05 Jan, 1999 1 commit
  22. 16 Nov, 1998 1 commit
    • Simon Middleton's avatar
      Updated all the writers to ignore the NULL_UCS4 character (as had been... · 103112be
      Simon Middleton authored
      Updated all the writers to ignore the NULL_UCS4 character (as had been previously added to the iso2022_escapes case). Any new writers should flush any pending characters they may have at this point.
      
      Also udpated enc_UCS4.c and utf8.c to turn all illegal characeters
      (top bit set) into FFFD.
      
      Version 0.28. Tagged as 'Unicode-0_28'
      103112be
  23. 06 Nov, 1998 1 commit
  24. 15 Sep, 1998 1 commit
  25. 10 Sep, 1998 1 commit
  26. 04 Sep, 1998 1 commit
  27. 06 Mar, 1998 1 commit
  28. 05 Jan, 1998 1 commit
    • Simon Middleton's avatar
      Fixed autojp state machine. It wasn't resetting 'state' to HAD_NONE after... · 407bccff
      Simon Middleton authored
      Fixed autojp state machine. It wasn't resetting 'state' to HAD_NONE after changing whatcode. So basically it was lucky it ever worked. Also rewrote the various range tests to only use one compare per case.
      
      Changed the 'for_encoding' parameter to encoding_write() to an enumeration.
      Added a new type of writing where if the character cannot be encoded then
      the function returns -1 rather than writing a default character
      Added the pseudo-charsets csAutodetectJP and csEUCorShiftJIS to the encoding
      table so that they return the correct default language (ja).
      Added function to remove unused encoding tables (must be called explicitly).
      Fixed usage counting in iso2022 (I think).
      When looking up encoding name try stripping 'x-' and 'X-' off the front i
      can't find on first pass.
      
      Version 0.12. Tagged as 'Unicode-0_12'
      407bccff
  29. 18 Dec, 1997 1 commit
  30. 10 Dec, 1997 1 commit
  31. 08 Dec, 1997 1 commit
    • Simon Middleton's avatar
      Fixed when SS1 or SS2 followed by a set change by disallowing... · 67178217
      Simon Middleton authored
      Fixed when SS1 or SS2 followed by a set change by disallowing controlcharacters after single shifts.
      
      Made encoding_table_ptr and encoding_n_table_entries check for null tables.
      moved 'Lm' type characters from marks to letters in mkunictype.
      
      Version 0.08. Tagged as 'Unicode-0_08'
      67178217
  32. 02 Dec, 1997 1 commit
  33. 21 Nov, 1997 1 commit
    • Simon Middleton's avatar
      Added new file 'languages.h' with some ISO639 language codes. · fa3fa475
      Simon Middleton authored
      Added a default language field to each encoding (using above codes).
      Added a max char size field to each encoding.
      Tidied up some of the reencoders behaviour when output ptr NULL.
      Fixed a load of charset numbers which were wrong.
      New UTF8 function to skiop multiple characters in a string.
      Fixed RISC OS build which was out of date.
      
      Version 0.04. Tagged as 'Unicode-0_04'
      fa3fa475
  34. 12 Nov, 1997 1 commit
  35. 11 Nov, 1997 2 commits