[PD] Concatenating two atoms into one?

Christof Ressi info at christofressi.com
Thu Oct 1 13:18:48 CEST 2020


> your computer memory will define the time when it will crash Pd (it will
> crash, when all the strings in the symboltable eat up all the memory
> available)
I think eating up all available memory is not a likely scenario on 
modern computers with a 64-bit address space and virtual memory.

The actual problem is that Pd's symbol table is implemented as a hash 
table with seperate chaining 
(https://en.wikipedia.org/wiki/Hash_table#Separate_chaining). This means 
that symbols which have the same hash value (= hash collision) are 
connected as a linked list. If you look up a such a symbol, you have to 
walk the linked list and do a string comparison for each element until 
you find a match.

The more symbols you add, the more hash collisions you get and the more 
symbols end up in the same bucket. In practice, this means that for a 
large number of elements, insertion and lookup becomes more and more 
expensive because the linked lists for each bucket grow larger and 
larger. While the hash lookup takes constant time "O(1)", the linked 
list traversal takes linear time "O(N)" (and is not very cache friendly).

Good hash table implementations mitigate this problems by a technique 
called "rehashing" where the hash table array is resized if the "load 
factor" (number of elements / number of buckets) exceed a certain 
threshold (= "load factor"). Then every item is hashed again and 
inserted back into the new hash table.

Unfortunately, Pd doesn't do any rehashing and it is certainly possible 
to significantly slow down Pd by flooding the symbol table. A common 
load factor threshold is 0.75, but if you create 10 000 unique symbols 
in Pd, the load factor would be ~10! Not very good...

In fact, I've already been thinking about improving the symbol table.

Christof

On 01.10.2020 10:28, IOhannes m zmoelnig wrote:
> On 2020-10-01 09:54, oliver wrote:
>> IOhannes m zmoelnig wrote:
>>> On 2020-10-01 09:22, hans w. koch wrote:
>>>> but be aware of the risks of invoking makefilname all too often.
>>> note that if you use dollsyms (as in `[$1$2(`) you are filling up the
>>> symbol table just as well.
>> i was just about to ask, if the attached modified patch would avoid that
>> problem, but you replied already.
>>
>> could you please clarify the used term "invoke" a bit ?
>> i guess the number of [makefilename] objects isn't the problem, but how
>> much/often its conversion mechanism is used, right ?
> yes (the latter).
>
>> does that mean that everytime a number->symbol conversion happens
>> (regardless how it is done) the symboltable is filled and will
> no.
> everytime a *new* symbol is created.
> the point of symbols (vs ordinary strings) is, that a single literal
> only needs to be stored once.
> so if you first create a string "rubadub" (however you do this), a new
> entry for the symbol 'rubadub' is created.
> now, if you concatenate the symbols 'rubad' and 'ub', the result is
> "rubadub", which already happens to be in the symbol table (and thus no
> new entry needs to be generated).
> for Pd these strings are *identical*.
> this is cool as we can really easily compare the two strings. if they
> occupy the same entry in the symbol table (which basically means, that
> Pd gets the same pointer for when turning the literal into a symbol),
> then the two strings are the same.
> so rather than having to compare each character of the string
> "sjfdjdasjfsfjrueincru057894_curtrfenr3ewf8354j3wp57jp3" with each
> character of "sjfdjdasjfsfjrueincru057894_curtrfenr3ewfB354j3wp57jp3" ,
> Pd only needs to compare two pointers - and this can be done in a single
> step on your CPU.
>
> the problem with generating symbol programmatically (e.g. by sending
> numbers to [makefilename %d]) is, that it is so super easy to generate
> lots and lots of (different) symbols.
>
>
>> eventually slow down or crash PD ?
>>
>> so, as a live example: writing number values to GUI labels dynamically
>> is a potentially dangerous thing ? what's the threshold there ?
> your computer memory will define the time when it will crash Pd (it will
> crash, when all the strings in the symboltable eat up all the memory
> available)
>
> as for the slow-down, why not simply create a patch that tests this for you?
>
> create labels with [makefilename label%08d] with the input ranging from
> 0...2000000 (or so; you'll notice when it gets slow).
> measure the time it takes to generate the symbols (well, measure the
> time it takes to generate 10000 symbols or)
>
>
>> or is there any way to clear the symboltable ?
> i think i covered this in another ("the" other) post quite recently.
>
>
> gfsdm
> IOhannes
>
>
> _______________________________________________
> Pd-list at lists.iem.at mailing list
> UNSUBSCRIBE and account-management -> https://lists.puredata.info/listinfo/pd-list





More information about the Pd-list mailing list