[PD] bitmap sequencer and flext png library wrapper (fwd)

Sat Nov 30 02:35:59 CET 2002

I am sending this back on the list because my address gets blocked by your
mailserver for some obscure reason.

---------- Forwarded message ----------
Date: Fri, 29 Nov 2002 19:26:00 -0500 (EST)
From: Mathieu Bouchard <matju at sympatico.ca>
To: rfigura at metabit.com
Cc: Alexandre Castonguay <acastonguay at artengine.ca>
Subject: Re: [PD] bitmap sequencer and flext png library wrapper

On Fri, 29 Nov 2002, Robert Figura wrote:
> Okayokayokayokay i _will_ take a look.

I am very grateful for that =)

> #0  0x402a2802 in st_lookup (table=0x0, key=0x150 <Address 0x150 out of bounds>, value=0xbfffafc8) at st.c:253
> #1  0x40250265 in search_method (klass=1079530340, id=336, origin=0xbfffb004) at eval.c:250

It means that some ruby-class has been blown up by memory corruption.
"table" is a null-pointer when it should not be. The problem is not with
"key", because although "key" is probably a void* here, it's not being
used as a true pointer.

> #4  0x40254f3c in rb_eval (self=1079619060, n=0x405991c0) at eval.c:2544

Unfortunately we can't get to Ruby's line numbers easily with
GNU-Debugger, and I don't know how. But:

> #28 0x4025a339 in rb_funcall (recv=1079670760, mid=9897, n=0) at eval.c:4688
> #29 0x40243cfa in gf_timer_handler__FP6_clockPv (alarm=0x0, obj=0x0) at base/bridge_puredata.c:384

It means that we're in GF's event loop.

> $ ltrace pd-debug -noaudio -nogui -open test.pd 2> trace.log

I never thought of using ltrace... thanks. Ruby makes a lot of
allocations/operations by itself that might not be (directly) relevant to
what GridFlow does.

> memcpy(0x0812a720, "method call on terminated object", 32) = 0x0812a720

This is a very important message. This means we have a dangling pointer
problem. However that might be only the symptom of some other
memory corruption.

> memcpy(0x08116e90, "Ruby-for-PureData:0", 19)     = 0x08116e90

This is GridFlow's toplevel trapping the error.

> ### in #4 ruby_frame says:
> file = "gridflow/base/MainLoop.rb"
> line = 178

oops, i forgot i could look at that information.
It is not very useful, as it points to a very ordinary expression.

> ahem. how to trace what is in ruby_block? (and why?) Okay i guess this
> year i will not find out what the function was libruby tried to call.
> (must have been tick, but that does not seem to matter) i see no way
> to spot the bug analytically. now lets attack the code. remove
> everything until we know what could be the problem...

I don't know the problem is anywhere close to that spot in the code
anyway. What shall be inspected thoroughly, imho, are anything running in
the initialization phase of gridflow, because i can get PD and jMax to
crash without loading any patch in a few seconds by setting the timer rate
high enough.

> first clear FObject_s_install_2 (hope this will avoid installing any
> objects in case one of the objects is defect.) - still coredump

By clearing, you mean disabling all of its code?

"install" is not all that may go wrong. There are other portions of
initialization that happen outside of "install".

"install" is just the final registering of a class with PD/jMax, but at
that point, the class is already registered with Ruby, and before that the
class may have had its own C++-level initialization.

>    btw: what is that timer_handler for?

It is mostly used for checking X11 events, TCP transfers (when they did
work), [rtmetro], and such.

> 16 hours later:
> Wherever i cut down a feature the error goes somewhere else. It could
> be ruby (i still have problems watching it's internal structures - and
> that's where the errors always come from)

That kind of wanton behaviour is typical of memory corruption bugs. It
looks as if the guilty piece of code is blameshifting, deceiving. A
self-concealing bug. One that says "I didn't do it!" and you don't know
where that voice comes from. This is why I called for help first.

> Bridge is a confusing tohuwabohu, you never know what data is going to
> ruby (all these id's and lists). Ruby has no separate process so you
> never know if ruby was poison or pd. I am not going to learn ruby from
> inside.

The whole GridFlow/PD bridge is eight pages and is pretty close to being
the smallest piece of code possible that can do what it does.

ID's are possibly a leftover from older versions of Ruby. It's part of the
Ruby API and I can't do much against them. There are a few global
variables that could become of type ID which could save a few conversions.

Lists are also essential. Most GridFlow patches uses lists as messages or
as arguments in a constructor. In addition, an argument list is a list,
and a named message is also a list...  The bridge supports recursive lists
in case one wants to code a Ruby object that accepts recursive lists.

> Giving up. I'd suggest a complete rewrite.

I can't consider a complete rewrite unless I have a good idea of how much
different it would be. The way it is now is quite close to the precise
idea I have of it. Of course I have plans about big changes to the
architecture, but those are not clear ideas as much as they are dreams.

Maybe one of the bigger things to consider would be the use of SWIG for
the C++ <-> Ruby bridging. However, some kind of feasibility study would
be required: I have the impression that SWIG won't support everything I
need to, or that there will be some kind of significant overhead.

Another solution might be the use of Ruby/DL, which is a different
approach to the same problem of wrapping libraries, but I don't know how
ready that is, and whether it works with C++ code, and whether it works
with Ruby 1.6 (Ruby 1.7 is considered experimental).

> I didn't like the setjmp stuff either so bitmap_png.cc reads a png then
> copies the data and then destroys the pnglib structures in a single
> function.

That's okay. Most of GF's format-handlers are doing too.

> The bitmap design is different (and so is the purpose). It works
> pixel-wise (!) so the only latency happens on read/write.

This sounds like GridFlow 0.1.x ... that was over 1.5 year ago.

> as i can see pd uses backtracking in the scheduler. if you ask two
> questions the first one will get completed first. a delay(0) aborts
> descending and continues on the next iteration. so you could connect
> your operators using message delays.

I don't understand this paragraph.

> Another way would be to put the operators in a thread and let them
> asynchronously activate their outlets.

I'll begin considering threading eventually, but so many features have
priority that it is not likely you'll see anything about it in
2003; that is, unless I get superpowers, of course.

________________________________________________________________
Mathieu Bouchard                       http://artengine.ca/matju