[PD] request for objections: any2string -> unsigned char
Mathieu Bouchard
matju at artengine.ca
Thu Jan 15 20:45:13 CET 2009
On Thu, 15 Jan 2009, Bryan Jurish wrote:
> Unicode might be more immediately intuitive to most users, but when it
> comes down to it, byte-strings are IMHO the more basic representation (a
> char* is still a char*, even in this post-unicode world).
What happened is that people switched to UTF-8 instead of some fixed-size
encoding because many apps that assume that a character is a byte will
work anyway. Just don't ask those apps to say how many characters there
are in a string though. You have to pretend that all the "special"
characters are pairs of characters instead (when they are not triplets).
> A good string handling mechanism should have a good general default
> representation (e.g. as UTF-${MachineWordBits}), but should likewise
> allow access to "raw" byte strings, and be able to accommodate various
> encodings. Not that I'm really hankering to write any of that, mind you
> ;-) Perhaps a better name for the external as I think of it would be
> [any2bytes]. I'm perfectly willing to cede the "string" name to
> something better (Martin's string patch comes to mind),
I gather that it'll take a long time before Pd gets unicode support...
> ... except if you're building rsp. reading a persistent index for a
> large file, in which case tell() & seek() are likely to be a wee bit
> faster than parsing and counting variable-length-encoded characters ...
right.
_ _ __ ___ _____ ________ _____________ _____________________ ...
| Mathieu Bouchard - tél:+1.514.383.3801, Montréal, Québec
More information about the Pd-list
mailing list