[PD-dev] Shrink s_utf8.[ch]?

Sat Oct 8 17:08:43 CEST 2011

Greets,

It looks like a lot of the symbols in s_utf8.h and s_utf8.c are not used
anywhere in Vanilla.

These are used:

    isutf
    u8_wc_nbytes
    u8_wc_toutf8
    u8_wc_toutf8_nul
    u8_offset
    u8_charnum
    u8_inc
    u8_dec

These are not:

    u8_toucs
    u8_toutf8
    u8_wcs_nbytes         (only declaration in s_utf8.h)
    u8_inc_ptr
    u8_dec_ptr
    u8_nextchar
    u8_strlen             (only definition in s_utf8.c)
    trailingBytesForUTF8  (static array in s_utf8.c)
    offsetsFromUTF8       (static array in s_utf8.c)

While trying to clean up memory errors from Object text editing, it became
apparent that the UTF-8 manipulation library that Pd uses is mismatched: Pd's
strings apparently do not have NUL-terminatation, but several of the UTF-8
manipulation routines malfunction without it.  I've supplied a patch for three
such routines, but it seems better to eliminate any that we don't need and
remove the maintenance burden rather than attempt to fix them.

Would a patch that removes all the unused code from s_utf8.h and s_utf8.c be
well-received?  If a need arises in the future for any of the deleted
functions, they can be resurrected from version control and vetted at that
time.

Alternately, would it be worthwhile to explore the possibility of relying on
UTF-8 manipulation routines exposed via "tcl.h" and eliminating s_utf8.h and
s_utf8.c entirely?

Marvin Humphrey