[PD-dev] strings
Hans-Christoph Steiner
hans at eds.org
Sat Dec 16 18:38:11 CET 2006
On Dec 16, 2006, at 4:55 AM, Bryan Jurish wrote:
> morning,
>
> On 2006-12-16 01:40:03, Mathieu Bouchard <matju at artengine.ca>
> appears to have written:
>> On Fri, 15 Dec 2006, Hans-Christoph Steiner wrote:
>> An advantage using the list-of-bytes approach is that because each
>> character can be represented by a rather large integer, it can be
>> extended to work on lists-of-characters meaning quickly, if there
>> is a [utf8decode] and [utf8encode] to turn bytes into characters
>> and back; also it's a method that is available now and reuses the
>> existing list objects; and it's a method that supports \0 (NUL)
>> characters.
>> Disadvantages are that it takes more time to convert to C strings
>> and back, it takes more space in .pd files, it isn't readable as
>> text in .pd files, it takes up to 4 times more space to represent
>> in .pd files, and exactly 4 times more space in RAM (in the case
>> that just iso-latin-1 is used), and also that you can't make lists
>> of strings like that.
>
> i count (sizeof(int)+sizeof(float)-1)*strlen(message) wasted bytes
> per string object, not counting the selector. as i think we've
> discussed before, using ieee floats, which should be able to
> losslessly encode a 24 bit integer, that can be tweaked down to
> (sizeof(int)+sizeof(float)-1)*strlen(message)/3 on average, but on
> my system (32 bit floats), that still amounts to one wasted byte
> per character for the representation, and it's hellishly cryptic to
> boot.
>
>> (By the time we can have real strings, we can have nested-lists,
>> and the other way around, because they'd use the same mechanisms.
>> whether it's better to make them two types or one type, is a good
>> question.)
>
> ... but then again, what else are ascii 0x1c-0x1f (28-31 =
> {fs,gs,rs,us}) for? it's another ugly hack, would reserve some of
> the ascii range, and would require additional parsing objects
> (potentially constructable with [list]), but it's a possibility,
> should anyone actually need nested lists as strings...
>
> please don't get me wrong: i'm all in favor of "real" strings,
> nested lists, and associative arrays - i wrote [pdstring] because i
> needed to send some generated text over OSC to someone who could
> only interpret ascii values: i'm glad if it's helpful to anyone
> besides myself, and i don't see much difficulty in adding support
> for low-level c-type string operations ([toupper], [tolower], at
> some later point maybe even regexes), but i can't bring myself to
> believe that the list-of-bytes approach is really the "right" way
> to do it, although i don't have a better idea at the moment...
One advantage of this approach is that many C string functions like
toupper, tolower, strcat, strcmp, etc. would be pretty easy to
implement in Pd, rather than C. A regexp object in C would be pretty
straightforward.
How about using a selector "string" for these lists? I suppose that
could cause mayhem since it would make the list into a selector
series and run into all the vagaries of handling them.
.hc
------------------------------------------------------------------------
Man has survived hitherto because he was too ignorant to know how to
realize his wishes. Now that he can realize them, he must either
change them, or perish. -William Carlos Williams
More information about the Pd-dev
mailing list