[PD-dev] strings

Sat Dec 16 18:38:11 CET 2006

On Dec 16, 2006, at 4:55 AM, Bryan Jurish wrote:

> morning,
>
> On 2006-12-16 01:40:03, Mathieu Bouchard <matju at artengine.ca>  
> appears to have written:
>> On Fri, 15 Dec 2006, Hans-Christoph Steiner wrote:
>> An advantage using the list-of-bytes approach is that because each  
>> character can be represented by a rather large integer, it can be  
>> extended to work on lists-of-characters meaning quickly, if there  
>> is a [utf8decode] and [utf8encode] to turn bytes into characters  
>> and back; also it's a method that is available now and reuses the  
>> existing list objects; and it's a method that supports \0 (NUL)  
>> characters.
>> Disadvantages are that it takes more time to convert to C strings  
>> and back, it takes more space in .pd files, it isn't readable as  
>> text in .pd files, it takes up to 4 times more space to represent  
>> in .pd files, and exactly 4 times more space in RAM (in the case  
>> that just iso-latin-1 is used), and also that you can't make lists  
>> of strings like that.
>
> i count (sizeof(int)+sizeof(float)-1)*strlen(message) wasted bytes  
> per string object, not counting the selector.  as i think we've  
> discussed before, using ieee floats, which should be able to  
> losslessly encode a 24 bit integer, that can be tweaked down to  
> (sizeof(int)+sizeof(float)-1)*strlen(message)/3 on average, but on  
> my system (32 bit floats), that still amounts to one wasted byte  
> per character for the representation, and it's hellishly cryptic to  
> boot.
>
>> (By the time we can have real strings, we can have nested-lists,  
>> and the other way around, because they'd use the same mechanisms.  
>> whether it's better to make them two types or one type, is a good  
>> question.)
>
> ... but then again, what else are ascii 0x1c-0x1f (28-31 =  
> {fs,gs,rs,us}) for?  it's another ugly hack, would reserve some of  
> the ascii range, and would require additional parsing objects  
> (potentially constructable with [list]), but it's a possibility,  
> should anyone actually need nested lists as strings...
>
> please don't get me wrong: i'm all in favor of "real" strings,  
> nested lists, and associative arrays - i wrote [pdstring] because i  
> needed to send some generated text over OSC to someone who could  
> only interpret ascii values: i'm glad if it's helpful to anyone  
> besides myself, and i don't see much difficulty in adding support  
> for low-level c-type string operations ([toupper], [tolower], at  
> some later point maybe even regexes), but i can't bring myself to  
> believe that the list-of-bytes approach is really the "right" way  
> to do it, although i don't have a better idea at the moment...

One advantage of this approach is that many C string functions like  
toupper, tolower, strcat, strcmp, etc. would be pretty easy to  
implement in Pd, rather than C. A regexp object in C would be pretty  
straightforward.

How about using a selector "string" for these lists?  I suppose that  
could cause mayhem since it would make the list into a selector  
series and run into all the vagaries of handling them.

.hc
------------------------------------------------------------------------

Man has survived hitherto because he was too ignorant to know how to  
realize his wishes.  Now that he can realize them, he must either  
change them, or perish.    -William Carlos Williams