[PD] Strings.

Wed Feb 23 06:12:19 CET 2011

On Tue, 1 Feb 2011, Bryan Jurish wrote:

> True.  But while I can guarantee this for "string-like" operations, I
> can't seem to finagle it for pd, which insists on treating arrays (at
> least those defined at the patch level) as t_float[]s (looks like the
> culprit here is garray_save() calling binbuf_addv() to buffer the array

Oh, if you need to save the array using Pd's existing mechanisms, yeah, 
you shouldn't use anything else than Pd's way of storing data. (But do you 
really want to use arrays ?)

> reinterpret_cast<> doesn't exist.
> there's only (char*)expr  (cf. Kernighan & Ritchie, 1988) :-P
> sorry for being pedantic; you are of course correct that 
> reinterpret_cast<> is the C++ equivalent for what I'm doing; I just 
> think it's too much to type, which is yet another thing I dislike about 
> C++ ...

I'm just talking about it as a concept. I don't ever use that C++ keyword 
and I don't know what it's good for. I mean only the typecasts that change 
pointers into other, potentially irrelevant pointer-types. Those exist 
both in C and in C++, and they exist in C++ regardless of the actual use 
of the reinterpret_cast keyword (which seems like little more than a waste 
of characters in a text file).

I only use C++ for the shortcuts, not for the... longcuts.

> anyhoo, ok: I can (ab)use typecasts using (t_word*) instead of 
> (t_float*).  My original gripes (1C) and (1D) still hold: this breaks on 
> save/load of patches, if "string" data is to be saved with its array.

Therefore you are going to store one 'wchar' per float ? Fine. Now you 
don't need to store a string length, right ?

> I still think the idea of using arrays for strings is intriguing, not 
> least for the sheer amount of abuse potential arising from combining 
> text bytes and audio signals in the same arrays...

You can already plug [#tabread] into [#to_s] to make nice symbols, or send 
a symbol to [#import] plugged to a [#tabwrite], to write funny things into 
tables. People can abuse things already. It's "fun".

> but I find it horribly nasty to waste more than half of the memory 
> allocated

Store the big text in an external that is shielded from the niceness of 
Pd.

> (ok, the strings-as-lists-of-floats waste even more, but that's explicit 
> and open about its hackery; putting byte values into floats under the 
> hood and calling the result "string" would be cunning, devious, and 
> underhanded hackery... or something like that)

How about using plain symbols as strings, and then perform a giant 
mark-and-sweep (split into realtime-friendly parts using clock_delay()) to 
delete all the unused ones ? (just kidding).

> Now we're on to method (2).  The show-stopper for me here is argument
> (2B): external APIs.  Under this method, every time I want my pd
> "string" as a C string, I have to explicitly convert it, and vice versa,
> which takes additional buffers, possibly (re-)allocations, and O(N)
> time.  This is all likely to happen only at the control level, so maybe
> that's not system critical either.

Messages are as realtime-critical as DSP, as they run in the same thread. 
If you don't run DSP, then messages may be realtime-critical anyway : it 
depends on when you need things to happen. If using live MIDI/OSC/etc 
control, then the stuff is realtime-critical. But it's very possible that 
O(N)-conversions aren't going to be noticeably slower for what you're 
doing. How much % CPU would it really take ?

> Being able to easily incorporate external string-processing APIs (e.g. 
> the C library string handling routines)

The C library's string processing is made of really basic stuff that is 
about as easy to rewrite as it is to wrap. Thus you may as well rewrite 
them for any number type you choose to use.

>>> But GridFlow isn't vanilla either.
>> How many solutions do you want to reject ?
> n-1, for some natural number n.  sorry, can't be more specific yet.

If you absolutely want solutions to get into vanilla, there's only one 
person you have to talk to.

> Honestly, if I had a pressing need for handling large-ish amounts of 
> text data in pd, I would probably look to GridFlow.

For text processing, GridFlow's biggest problem is that it doesn't support 
arrays-of-strings of any kind. You can't make grids of variously-sized 
grids, and you can't make lists of grids either. But lists of grids are 
like lists of lists, Pd doesn't have any reference-counting, and thus it's 
quite futile for me to try to allow lists-of-grids now. (the other reason 
why you can't have lists-of-grids is that those atoms can't be assigned : 
they're not really grids, they're grid-sender handles.)

> As it is, I usually wind up trying to get all my string processing done 
> outside of pd, and passing the data back and forth via OSC or (brace 
> yourself) the filesystem, where the "strings" wind up as symbols, and 
> put a good deal of stress on pd's symbol table, but hey... it explodes 
> only very rarely...

How many symbols that is ? As you go many times over the size of the 
table, gensym() can get slower.

> I have installed Martin's blobs.  It involved only a single patch to the
> pd core and a re-compilation.  No big deal.

It's already included in pd-extended. You don't have to recompile.

> Last time I tried to install GridFlow (this was years ago), I was bitten 
> by many (potential) dependencies and an old system, and gave up.

You would have been bitten by many other problems. GridFlow is a lot 
better now. For example, there's a real reference manual that doesn't 
suck, libruby has been kicked out, and we have binary distros.

> If I were to work on yet another string handler for pd, I'd like to make 
> sure that it's got as wide a potential user base as possible.  Not 
> everyone uses pd-extended.

Not every pd distro can load externals. Is that part of the potential user 
base ?

Why do you want a potential user base as large as possible ?

>> But I was mentioning GridFlow just to tell you what's in there. From
>> there, not only you can decide to use GridFlow, but if you decide to
>> instead modify Pd, you can look at how GridFlow does it : isn't that
>> interesting ?
> It is, although I usually try to avoid mucking about in other people's code.

I don't mean reading my code, I mean trying the software and see what it 
feels like to be using something like it. But you may read the code too ;)

> I'd say those are good differentia, yes.  From where I'm standing, I'd
> put memory footprint and compatibility with existing 3rd-party APIs
> (e.g. conversion to/from (char*)) at the top the list, and Martin's
> strings fulfill those criteria admirably.

But Martin's strings are not available to non-extended users that don't 
compile their own pd.

> The list-of-strings issue you brought up is very interesting indeed; I'm
> almost tempted to push that into a general discussion of nested data
> structures, but I think we've already drawn this thread far enough OT ;-)

Thread OT, so what ? If you care to write about it, you write about it, 
even if it may mean starting another thread.

  _______________________________________________________________________
| Mathieu Bouchard ---- tél: +1.514.383.3801 ---- Villeray, Montréal, QC