[PD-dev] Pd Strings

Mathieu Bouchard matju at artengine.ca
Fri Nov 23 11:08:51 CET 2007


On Tue, 13 Nov 2007, Chris McCormick wrote:

> 1. How do you propose to solve the 'spaces in file path' issue without a
> string type? Or are you content with that restriction?

just fix atom_string().

> 2. How do you suggest that people deal with the symbol table pollution
> issue mentioned before on this list, when they are doing operations
> processing lots and lots of symbol-strings in Pd? Let me know if you
> want more information about this issue.

deallocatable symbols are possible, but they require a new API for 
externals. Any externals using the old API would automatically force 
symbols to stay allocated, because that API gives them the right to keep 
symbol pointers forever without even having to say that they are in use, 
so, pd can't safely deallocate unused symbols that have been used through 
the existing API.

using arrays or lists to represent strings, means that strings can't be 
atoms, because lists can't be used as atoms either. The problem of using 
lists as atoms is similar to the problem of deallocating symbols, in my 
view of how those lists should work, but an alternate solution would be to 
recursively copy all sublists of a message whenever an object wants to 
store a message. The latter is less efficient but easier to implement.

> 4. Can anyone else help me with a concise summary of other string/Pd
> issues I haven't thought of?

Some encodings of unicode need a distinction between number of bytes and 
number of characters. some other encodings of unicode use 2 bytes for 
every character. use of multiple encodings at once can be troublesome. use 
of unicode and non-unicode encodings at once can be even more troublesome. 
some special numbers in unicode are not characters, they are character 
modifiers. Tk 8.4's unicode support is not as complete as Tk 8.5's.

Storing strings as lists of floats waste a lot of memory: a character 
takes 1 to 3 bytes, perhaps 4, whereas an atom uses 8 bytes in 32-bit mode 
and 16 bytes in 64-bit mode. Arrays are more efficient, but you don't get 
below 4 bytes, and you can't pass them as messages.

Converting from float atom lists to C strings and back, take some time 
that could be reduced by using the same internal format (e.g. symbols are 
like that).

In addition to space problems, Pd objects often have trouble with braces, 
backslashes and such.

  _ _ __ ___ _____ ________ _____________ _____________________ ...
| Mathieu Bouchard - tél:+1.514.383.3801, Montréal QC Canada


More information about the Pd-dev mailing list