[PD] request for objections: any2string -> unsigned char

Bryan Jurish moocow at ling.uni-potsdam.de
Thu Jan 15 17:12:40 CET 2009


moin Mathieu, moin all,

On 2009-01-15 16:33:03, Mathieu Bouchard <matju at artengine.ca> appears to
have written:
> On Thu, 15 Jan 2009, Bryan Jurish wrote:
> 
>> Would anyone object if the [any2string] semantics were changed so that
>> only "unsigned char" values in the range (0..255) get output, rather
>> than (as is currently the case) "signed char" values in the range
>> (-128..127)?
>
> What's important to me is that the Pd user does not struggle with making
> pd interpret UTF-8 variable-length encoding, and instead struggles with
> making pd work with lists of characters, which is already enough work
> anyway.

Agreed (in principle at least)...  At the risk of repeating myself, I
wrote [any2string] and [string2any] as quick ugly hacks to get some sort
of rudimentary string handling in pd.  Roman mentioned a few other
externals (e.g. [comport]) which expect unsigned raw byte values, which
I think is sufficient reason to change the (byte-oriented) conventions
of [any2string].

Unicode might be more immediately intuitive to most users, but when it
comes down to it, byte-strings are IMHO the more basic representation (a
char* is still a char*, even in this post-unicode world).  Some of us
even still use non-unicode encodings by default.  A good string handling
mechanism should have a good general default representation (e.g. as
UTF-${MachineWordBits}), but should likewise allow access to "raw" byte
strings, and be able to accommodate various encodings.  Not that I'm
really hankering to write any of that, mind you ;-)

Perhaps a better name for the external as I think of it would be
[any2bytes].  I'm perfectly willing to cede the "string" name to
something better (Martin's string patch comes to mind), but that's just
a labelling issue (and since variable names are arbitrary, and externals
are in some sense variables, external names must therefore also be
arbitrary ;-)

> I like that [list length] gives me the number of characters and
> not the number of bytes, because the latter is rarely significant.

... except if you're building rsp. reading a persistent index for a
large file, in which case tell() & seek() are likely to be a wee bit
faster than parsing and counting variable-length-encoded characters ...

marmosets,
	Bryan

-- 
Bryan Jurish                           "There is *always* one more bug."
jurish at ling.uni-potsdam.de      -Lubarsky's Law of Cybernetic Entomology





More information about the Pd-list mailing list