[PD] request for objections: any2string -> unsigned char
Bryan Jurish
moocow at ling.uni-potsdam.de
Thu Jan 15 17:12:40 CET 2009
moin Mathieu, moin all,
On 2009-01-15 16:33:03, Mathieu Bouchard <matju at artengine.ca> appears to
have written:
> On Thu, 15 Jan 2009, Bryan Jurish wrote:
>
>> Would anyone object if the [any2string] semantics were changed so that
>> only "unsigned char" values in the range (0..255) get output, rather
>> than (as is currently the case) "signed char" values in the range
>> (-128..127)?
>
> What's important to me is that the Pd user does not struggle with making
> pd interpret UTF-8 variable-length encoding, and instead struggles with
> making pd work with lists of characters, which is already enough work
> anyway.
Agreed (in principle at least)... At the risk of repeating myself, I
wrote [any2string] and [string2any] as quick ugly hacks to get some sort
of rudimentary string handling in pd. Roman mentioned a few other
externals (e.g. [comport]) which expect unsigned raw byte values, which
I think is sufficient reason to change the (byte-oriented) conventions
of [any2string].
Unicode might be more immediately intuitive to most users, but when it
comes down to it, byte-strings are IMHO the more basic representation (a
char* is still a char*, even in this post-unicode world). Some of us
even still use non-unicode encodings by default. A good string handling
mechanism should have a good general default representation (e.g. as
UTF-${MachineWordBits}), but should likewise allow access to "raw" byte
strings, and be able to accommodate various encodings. Not that I'm
really hankering to write any of that, mind you ;-)
Perhaps a better name for the external as I think of it would be
[any2bytes]. I'm perfectly willing to cede the "string" name to
something better (Martin's string patch comes to mind), but that's just
a labelling issue (and since variable names are arbitrary, and externals
are in some sense variables, external names must therefore also be
arbitrary ;-)
> I like that [list length] gives me the number of characters and
> not the number of bytes, because the latter is rarely significant.
... except if you're building rsp. reading a persistent index for a
large file, in which case tell() & seek() are likely to be a wee bit
faster than parsing and counting variable-length-encoded characters ...
marmosets,
Bryan
--
Bryan Jurish "There is *always* one more bug."
jurish at ling.uni-potsdam.de -Lubarsky's Law of Cybernetic Entomology
More information about the Pd-list
mailing list