[PD] request for objections: any2string -> unsigned char

Thu Jan 15 20:45:13 CET 2009

On Thu, 15 Jan 2009, Bryan Jurish wrote:

> Unicode might be more immediately intuitive to most users, but when it
> comes down to it, byte-strings are IMHO the more basic representation (a
> char* is still a char*, even in this post-unicode world).

What happened is that people switched to UTF-8 instead of some fixed-size 
encoding because many apps that assume that a character is a byte will 
work anyway. Just don't ask those apps to say how many characters there 
are in a string though. You have to pretend that all the "special" 
characters are pairs of characters instead (when they are not triplets).

> A good string handling mechanism should have a good general default 
> representation (e.g. as UTF-${MachineWordBits}), but should likewise 
> allow access to "raw" byte strings, and be able to accommodate various 
> encodings.  Not that I'm really hankering to write any of that, mind you 
> ;-) Perhaps a better name for the external as I think of it would be 
> [any2bytes].  I'm perfectly willing to cede the "string" name to 
> something better (Martin's string patch comes to mind),

I gather that it'll take a long time before Pd gets unicode support...

> ... except if you're building rsp. reading a persistent index for a
> large file, in which case tell() & seek() are likely to be a wee bit
> faster than parsing and counting variable-length-encoded characters ...

right.

  _ _ __ ___ _____ ________ _____________ _____________________ ...
| Mathieu Bouchard - tél:+1.514.383.3801, Montréal, Québec