[PD-dev] Pd Strings

Mathieu Bouchard matju at artengine.ca
Fri Nov 23 11:17:24 CET 2007


On Wed, 14 Nov 2007, Hans-Christoph Steiner wrote:

> Using arrays as strings is an interesting idea.  I don't think non-
> ascii charsets should be too big a deal, they are decently supported
> right now, without even trying :).  The Pd floats should store UTF-16
> fine, which really covers basically everything.  By the time UTF-32
> is used much, Pd will be using 64-bit floats.

UTF is just an encoding over variable number of bytes. Storing as pd 
floats is more like UCS-4, which is a unicode encoding over a fixed number 
of bytes, the difference being that UCS-4 uses uint32 instead of float32, 
or instead of float32 plus a type tag that is always set to say "this is a 
float", or that plus padding because of 64-bit mode.

float32 supports all integers from 0 to 16777216, so, it includes anything 
that uint24 can do. I don't think you need to go beyond 18 bits, let alone 
24. (UTF-8 needs extra bits per byte to say whether the character 
continues in the next byte; those don't count here, as we'd be using a 
fixed size)

Afaik, the difference between UTF-8 and UTF-16 is only that the latter 
tends to take somewhat less space if most of your characters are not in 
the ASCII range (32 to 126). this is because of the extra bits I'm talking 
about. In practice, all you can do with UTF-16 can already be done with 
UTF-8.

  _ _ __ ___ _____ ________ _____________ _____________________ ...
| Mathieu Bouchard - tél:+1.514.383.3801, Montréal QC Canada


More information about the Pd-dev mailing list