• Andrew Hodgkinson's avatar
    Fix bugs and inconsistencies in encoding handlers. · 69a25816
    Andrew Hodgkinson authored
      Fix inconsistency in handling illegal byte sequences.
      Convert surrogate codepoints and U+FFFE, U+FFFF to U+FFFD.
      Also, a few extra mappings.
    Detail:
      enc_utf8.c: 0x80 is a continuation byte. Map stray ones to U+FFFD.
                  Reset the count of expected continuation bytes to 0 when
                  encountering illegal byte sequences. Previously, if the character
                  callback returned non-zero, this count would not be reset, thus
                  leaving the codec in an inconsistent state. Additionally, we no
                  longer consume the illegal continuation byte: instead, we process
                  it as a start byte next time round.
      encoding.c: Do not load extension tables for ISO-8859-{1,2,9,10,15,16}
                  If these are needed, it's probably best that different charset
                  names are used rather than overloading 8859-n.
      iso2022.c:  Permit SS2/3 escape sequences for EUC encode/decode.
                  Disable C1 chara...
    69a25816
shiftjis 9.19 KB