<div dir="ltr"><div>To add to the great explanations already given by Johannes and Christoph, Miller gave a seminar last spring quarter called "Inside Pd" where he explains the inner workings of Pd in great detail. <br></div><div>Here is the course site:</div><div><a href="http://msp.ucsd.edu/syllabi/206.20s/index.htm">http://msp.ucsd.edu/syllabi/206.20s/index.htm</a></div><div>and here the videos:</div><div><a href="http://msp.ucsd.edu/syllabi/206.20s/movies/">http://msp.ucsd.edu/syllabi/206.20s/movies/</a></div><div>In the video <a href="http://msp.ucsd.edu/syllabi/206.20s/movies/2b.apr10.mp4">2b.apr10.mp4</a> he explains the symbol storage strategy.</div><div><br></div><div>I'm very grateful to be able to access this content, thank you Miller!<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">El jue., 1 oct. 2020 a las 13:21, Christof Ressi (<<a href="mailto:info@christofressi.com">info@christofressi.com</a>>) escribió:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> your computer memory will define the time when it will crash Pd (it will<br>

> crash, when all the strings in the symboltable eat up all the memory<br>

> available)<br>

I think eating up all available memory is not a likely scenario on <br>

modern computers with a 64-bit address space and virtual memory.<br>

<br>

The actual problem is that Pd's symbol table is implemented as a hash <br>

table with seperate chaining <br>

(<a href="https://en.wikipedia.org/wiki/Hash_table#Separate_chaining" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/Hash_table#Separate_chaining</a>). This means <br>

that symbols which have the same hash value (= hash collision) are <br>

connected as a linked list. If you look up a such a symbol, you have to <br>

walk the linked list and do a string comparison for each element until <br>

you find a match.<br>

<br>

The more symbols you add, the more hash collisions you get and the more <br>

symbols end up in the same bucket. In practice, this means that for a <br>

large number of elements, insertion and lookup becomes more and more <br>

expensive because the linked lists for each bucket grow larger and <br>

larger. While the hash lookup takes constant time "O(1)", the linked <br>

list traversal takes linear time "O(N)" (and is not very cache friendly).<br>

<br>

Good hash table implementations mitigate this problems by a technique <br>

called "rehashing" where the hash table array is resized if the "load <br>

factor" (number of elements / number of buckets) exceed a certain <br>

threshold (= "load factor"). Then every item is hashed again and <br>

inserted back into the new hash table.<br>

<br>

Unfortunately, Pd doesn't do any rehashing and it is certainly possible <br>

to significantly slow down Pd by flooding the symbol table. A common <br>

load factor threshold is 0.75, but if you create 10 000 unique symbols <br>

in Pd, the load factor would be ~10! Not very good...<br>

<br>

In fact, I've already been thinking about improving the symbol table.<br>

<br>

Christof<br>

<br>

On 01.10.2020 10:28, IOhannes m zmoelnig wrote:<br>

> On 2020-10-01 09:54, oliver wrote:<br>

>> IOhannes m zmoelnig wrote:<br>

>>> On 2020-10-01 09:22, hans w. koch wrote:<br>

>>>> but be aware of the risks of invoking makefilname all too often.<br>

>>> note that if you use dollsyms (as in `[$1$2(`) you are filling up the<br>

>>> symbol table just as well.<br>

>> i was just about to ask, if the attached modified patch would avoid that<br>

>> problem, but you replied already.<br>

>><br>

>> could you please clarify the used term "invoke" a bit ?<br>

>> i guess the number of [makefilename] objects isn't the problem, but how<br>

>> much/often its conversion mechanism is used, right ?<br>

> yes (the latter).<br>

><br>

>> does that mean that everytime a number->symbol conversion happens<br>

>> (regardless how it is done) the symboltable is filled and will<br>

> no.<br>

> everytime a *new* symbol is created.<br>

> the point of symbols (vs ordinary strings) is, that a single literal<br>

> only needs to be stored once.<br>

> so if you first create a string "rubadub" (however you do this), a new<br>

> entry for the symbol 'rubadub' is created.<br>

> now, if you concatenate the symbols 'rubad' and 'ub', the result is<br>

> "rubadub", which already happens to be in the symbol table (and thus no<br>

> new entry needs to be generated).<br>

> for Pd these strings are *identical*.<br>

> this is cool as we can really easily compare the two strings. if they<br>

> occupy the same entry in the symbol table (which basically means, that<br>

> Pd gets the same pointer for when turning the literal into a symbol),<br>

> then the two strings are the same.<br>

> so rather than having to compare each character of the string<br>

> "sjfdjdasjfsfjrueincru057894_curtrfenr3ewf8354j3wp57jp3" with each<br>

> character of "sjfdjdasjfsfjrueincru057894_curtrfenr3ewfB354j3wp57jp3" ,<br>

> Pd only needs to compare two pointers - and this can be done in a single<br>

> step on your CPU.<br>

><br>

> the problem with generating symbol programmatically (e.g. by sending<br>

> numbers to [makefilename %d]) is, that it is so super easy to generate<br>

> lots and lots of (different) symbols.<br>

><br>

><br>

>> eventually slow down or crash PD ?<br>

>><br>

>> so, as a live example: writing number values to GUI labels dynamically<br>

>> is a potentially dangerous thing ? what's the threshold there ?<br>

> your computer memory will define the time when it will crash Pd (it will<br>

> crash, when all the strings in the symboltable eat up all the memory<br>

> available)<br>

><br>

> as for the slow-down, why not simply create a patch that tests this for you?<br>

><br>

> create labels with [makefilename label%08d] with the input ranging from<br>

> 0...2000000 (or so; you'll notice when it gets slow).<br>

> measure the time it takes to generate the symbols (well, measure the<br>

> time it takes to generate 10000 symbols or)<br>

><br>

><br>

>> or is there any way to clear the symboltable ?<br>

> i think i covered this in another ("the" other) post quite recently.<br>

><br>

><br>

> gfsdm<br>

> IOhannes<br>

><br>

><br>

> _______________________________________________<br>

> <a href="mailto:Pd-list@lists.iem.at" target="_blank">Pd-list@lists.iem.at</a> mailing list<br>

> UNSUBSCRIBE and account-management -> <a href="https://lists.puredata.info/listinfo/pd-list" rel="noreferrer" target="_blank">https://lists.puredata.info/listinfo/pd-list</a><br>

<br>

<br>

<br>

_______________________________________________<br>

<a href="mailto:Pd-list@lists.iem.at" target="_blank">Pd-list@lists.iem.at</a> mailing list<br>

UNSUBSCRIBE and account-management -> <a href="https://lists.puredata.info/listinfo/pd-list" rel="noreferrer" target="_blank">https://lists.puredata.info/listinfo/pd-list</a><br>

</blockquote></div><br clear="all"><br>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div>Maximiliano Estudies</div><div><i>VDT Referat Beschallung</i><br></div></div></div><div>+49 176 36784771</div><div><span><a href="http://omslo.com" target="_blank">omslo.com</a><br></span></div><div><span><a href="http://maxiestudies.com" target="_blank">maxiestudies.com</a> </span><br></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div>