[PD-dev] strings

Bryan Jurish moocow at ling.uni-potsdam.de
Sat Dec 16 10:55:57 CET 2006


morning,

On 2006-12-16 01:40:03, Mathieu Bouchard <matju at artengine.ca> appears to 
have written:
> On Fri, 15 Dec 2006, Hans-Christoph Steiner wrote:
> 
> An advantage using the list-of-bytes approach is that because each 
> character can be represented by a rather large integer, it can be 
> extended to work on lists-of-characters meaning quickly, if there is a 
> [utf8decode] and [utf8encode] to turn bytes into characters and back; 
> also it's a method that is available now and reuses the existing list 
> objects; and it's a method that supports \0 (NUL) characters.
> 
> Disadvantages are that it takes more time to convert to C strings and 
> back, it takes more space in .pd files, it isn't readable as text in .pd 
> files, it takes up to 4 times more space to represent in .pd files, and 
> exactly 4 times more space in RAM (in the case that just iso-latin-1 is 
> used), and also that you can't make lists of strings like that.

i count (sizeof(int)+sizeof(float)-1)*strlen(message) wasted bytes per 
string object, not counting the selector.  as i think we've discussed 
before, using ieee floats, which should be able to losslessly encode a 
24 bit integer, that can be tweaked down to 
(sizeof(int)+sizeof(float)-1)*strlen(message)/3 on average, but on my 
system (32 bit floats), that still amounts to one wasted byte per 
character for the representation, and it's hellishly cryptic to boot.

> (By the time we can have real strings, we can have nested-lists, and the 
> other way around, because they'd use the same mechanisms. whether it's 
> better to make them two types or one type, is a good question.)

... but then again, what else are ascii 0x1c-0x1f (28-31 = 
{fs,gs,rs,us}) for?  it's another ugly hack, would reserve some of the 
ascii range, and would require additional parsing objects (potentially 
constructable with [list]), but it's a possibility, should anyone 
actually need nested lists as strings...

please don't get me wrong: i'm all in favor of "real" strings, nested 
lists, and associative arrays - i wrote [pdstring] because i needed to 
send some generated text over OSC to someone who could only interpret 
ascii values: i'm glad if it's helpful to anyone besides myself, and i 
don't see much difficulty in adding support for low-level c-type string 
operations ([toupper], [tolower], at some later point maybe even 
regexes), but i can't bring myself to believe that the list-of-bytes 
approach is really the "right" way to do it, although i don't have a 
better idea at the moment...

marmosets,
	Bryan

-- 
Bryan Jurish                           "There is *always* one more bug."
jurish at ling.uni-potsdam.de      -Lubarsky's Law of Cybernetic Entomology




More information about the Pd-dev mailing list