[PD] Store data in memory more efficiently than in arrays

Fri Jan 14 09:11:45 CET 2022

On 1/13/22 15:41, José de Abreu wrote:
> Roman, maybe you could use iem16?
> 
[...]
> [table16] uses only 16bit (2bytes) to store the values, which is half of
> the memory."
> 
> So maybe it is exactly what you need?
> 

i don't really think so.
afaict, roman is mainly concerned about a *potential waste* of memory.
in his words:
 > Not that I ever hit a memory limit, I'm just curious.

so to answer roman's question first (possibly repeating what christof said):

of course it would be possible to store data in a more packed format, 
saving quite a lot of memory (a factor of 8 on an 64bit system!).
however, it would complicate the internal data handling a lot.
right now, there's a unified data model, where each message (or array) 
consists of *atoms* of a single size: this allows us to write code 
*once* for multiple cases (and whenever there's a bug, it only needs to 
be fixed once) rather than special-casing different data-layouts with 
similar but subtly different code (and whenever there's a bug, it needs 
to be fixed in each place separately, with the possibility to forget one 
of those places every time we do it...).
it also allows us to have "data structures".

so we are trading code complexity for memory consumption.

this is a trade i would do any time (favouring more memory over more 
complex code).

obviously this comes with problems: since we need more memory, we might 
hit the physical RAM size, in which case we get into trouble.

but since - as you say - you've never actually hit the memory limit (and 
according to the number of times this is being discussed on the list, it 
seems that hardly anybody ever does), i'd classify this as "premature 
optimization".

sidenote: of course we are not alone.
take for example the most popular programming language¹ of the last few 
years:
a boolean value ideally requires a single bit to be stored accurately.
now in python (tested on Python3.9 on a 64bit linux system), it doesn't 
take 1 bit, or even 1 byte, but instead it takes 8 bytes.

 >>> sys.getsizeof([True]*3)-sys.getsizeof([True]*2)
8

at least, if you store the boolean in an array (a single boolean value 
(outside of an array) has some extra metadata, that take 28 bytes in total)²

 >>> sys.getsizeof(True)
28

now about "iem16":

that library was indeed written to store data more efficiently.
i wrote it in 2003 or so (according to the VCS history and some comments 
in the code) to implement the live electronics for a piece that required 
a long (IIRC: 20 minutes) multichannel (IIRC: 4 channels) delayline.

back then, Pd was practically everywhere 32bit (the first amd64 
processor was released in 2003; the first windows to run on a 64bit 
address space was released in 2005), so a single number stored in an 
array would require (only) 4 byte.
if my math serves me right, the required delay line would need a 
laughable 200MiB or RAM.
otoh, a "PowerBook G4 (late 2002)" would be equipped with 256MB by 
default³. specs for PC laptops would probably be about the same.
so it was practically impossible to run a 200MB delayline on such 
systems (at least if you also wanted to run Pd and an OS), and we had to 
trim down the memory consumption so the patch could be used on the 
musicians' laptops.

i don't remember having had a need for this library since then.

mtgasr
IOhannes

¹ according to PYPL: https://pypl.github.io/PYPL.html
² this argument is somewhat flawed, as python also gives us as 'bytes' 
class to store data in byte-arrays, where each byte consumes exactly 1 byte.
³ https://en.wikipedia.org/wiki/PowerBook_G4
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20220114/5122131e/attachment.sig>