[PD] utf8 over tcp

Tue Apr 21 00:52:35 CEST 2015

On 04/20/2015 04:05 AM, IOhannes m zmoelnig wrote:
> On 2015-04-19 22:45, Jonathan Wilkes via Pd-list wrote:
>> On 04/19/2015 03:46 PM, IOhannes m zmölnig wrote:
>>> since TCP/IP is totally packet agnostic, thou shalt not rely on it's
>>> packetizing capabilities.
>>> if your receiver emits packets the same as you sent them, then you were
>>> merely lucky.
>> Thanks.  There's a string-decoder lib that puts a buffer's extra "tail"
>> bytes
>> in a separate bin so they can be prepended to the next buffer.  That
>> sounds like
>> the way to go.
> any packetizing token ('"tail" bytes') needs to be excluded from or
> within from the actual payload.

Well, there are two things:
1) partial FUDI message at the end of a buffer
2) bytes of a UTF-8 character split across two different buffers

The algorithm I have been using can silently fail on #2,
but the solution to that is trivial.  Given that solution, I
can guarantee #1 is handled correctly.

I'm just curious why I'm unable to ever trigger a run-time
error for #2.

>
>> I wonder if localhost is doing optimizations that make the problem
>> unlikely to happen.
> do not trust it.

I don't.  I'm just making sure I understand how a vital part of the
gui message passing system works, by trying to break it based
on what I've learned from reading the spec on the underlying protocol.

> much broken multithreaded code used to work fine on single-core
> machines, because the single-threaded architecture made some problems
> unlikely to happen. and then we got multiple cores...
> (admittedly, this story is entirely made up by me; but there is no
> reason why it should not not true)
>
>> Right, but one can easily lose data before that part of the algorithm
>> happens.  Node's Buffer API makes this extremely easy to do.  The
>> string-decoder lib makes it easy to remedy, though.
> if you "can easily lose data" over your TCP/IP connection then something
> is seriously wrong with your setup.

I am not losing data over the TCP/IP connection.  I am potentially 
losing data
before tokenizing the FUDI messages.  That's because I was using a 
"toString"
method which silently gets corrupted when bytes of a multi-byte character
straddle two incoming buffers of data.

Anyway it's trivial to fix-- again, I was just curious why I cannot trigger
a corruption at all, even when flooding the socket with messages that have
a large percentage of multi-byte characters.

>
>
> (but anyhow, i don't know the actual problem you are working on, and i
> assumed that it is about your gui-rewrite. i'm nitpicky to make you
> avoid putting too much (== any) code depending on node.js features into
> the Pd-core side. but you are most likely aware of that anyhow)

It's difficult to imagine how I could hard code a node.js
dependency into it.

As far as Javascript...
I do have a handful of wrappers around sys_vgui in s_inter.c.
Those wrappers are tailored to being eval'd in Javascript.
This is because it's easy and easily changed (should a better
approach implement itself).

The code to replace sys_vgui calls in g_*.c is quite similar to
pd_vmess.  There is one set of convenience functions I used
for garray, scalars, and iemgui props which rely on an ability
to send arrays of data to the gui.  While I think it'd be great
to do symmetric FUDI, there's a reason why arrays exist
and my sanity is one of them.

But if there are better approaches, it shouldn't take too long
for interested parties to revise the wrappers and demo the
benefits.

-Jonathan

>
>
> fgmasdr
> IOhannes
>
>
>
> _______________________________________________
> Pd-list at lists.iem.at mailing list
> UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puredata.info/pipermail/pd-list/attachments/20150420/d2026763/attachment.html>