[PD-dev] UTF-8 for pd-devel (again)

Bryan Jurish moocow at ling.uni-potsdam.de
Thu Mar 12 23:27:03 CET 2009


moin Hans, moin list,

Alright, diff attached.  Should be pretty well human-readable, and
comments should be mostly self explanatory (famous last words).  A few
changes (or lack thereof) deserve special mention, since they don't
directly relate to UTF-8 handling:

Tk proc pdtk_canvas_editval in src/pdtk_canvas.tcl:
+ SVN HEAD code caused a race condition / loop of 'pdsend pd_editmode'
calls as bound to the menu entry for 'Edit Mode', which made editing
impossible.  My reversion to the formerly commented-out code makes
editing possible again, but display and toggle are still buggy.  Oddly,
I didn't encounter this bug during the whole debugging process (despite
occasional edits of a test patch), but it does happen here now with both
a "clean" SVN HEAD version of pd-devel/0.41-4 as well as my cleaned-up
working copy.  Maybe all my debug code (not in the diff) was slowing
things down enough to avoid the loop...

Tk variable font_fixed_metrics src/pd.tk:
+ the values here are inconsistent with those in sys_fontlist[] in
s_main.c.  My fonts looked much better when I changed the tk variable to
jive with sys_fontlist[].  Also, with the current version, the
pixel-to-character translation code in g_rtext.c doesn't work right (as
of the 3rd or 4th line of text).  I haven't included this change in the
diff (except for comments where relevant), since I'm not sure whether
the inconsistency was intensional (e.g. to test font metric computation
and communication)...

new C sources src/s_utf8.h, src/s_utf8.c
+ modified and trimmed versions of the Bezanson code.  I actually only
use 3 or 4 functions of these, so it could conceivably be trimmed down
even further; I just cut out the stuff I don't expect to need (escapes,
vprintf, ...).  Should remain separate source files though, since
they're needed in both g_rtext.c and g_editor.c.

marmosets,
	Bryan

On 2009-03-12 02:03:04, Hans-Christoph Steiner <hans at eds.org> appears to
have written:
> 
> On Mar 11, 2009, at 6:07 PM, Bryan Jurish wrote:
> 
>> moin folks,
>>
>> I believe I've finally got pd-devel 0.41-4 using UTF-8 across the board.
>> So far, I've tested message boxes & comments (g_rtext), as well as
>> symbol atoms, and all seems good.  I think we can still expect goofiness
>> if someone names an abstraction using a multibyte character when the
>> filesystem isn't UTF-8 encoded (raw 8-bit works for me here too), but I
>> really don't want to open that particular can of worms.
>>
>> So I guess I have 2 questions:
>>
>> (1) what should I call the generic UTF-8 source files? (see my other
>> post)
> 
> I don't quite follow...  do you mean there are separate .c files for the
> UTF-8 code?

Yes, there is 1 .c and 1 .h file for utf8 handling, in the diff as
s_utf8.[ch].  I was tempted to use "z_", but didn't want to be the first ;-)

> I guess I am not sure which post you mean.

The one with subject "Pd C source prefix conventions?".

>> (2) shall I commit these changes to pd-devel/0.41-4, or somewhere else,
>> or just post a diff (ca. 33k, ought to be easier to read now; I've tried
>> to follow the indentation conventions of the source files I modified)?
> 
> If there are major changes to the C code, we'll have to run that by
> Miller.  He's already more or less accepted the existing pd-devel Tcl
> code, so he might be willing to accept these C changes as well.  I think
> its really important work, especially since Pd-devel also has localized
> menus and text.

I'm not sure how "major" the changes are... they're pretty much
restricted to a few lines in g_editor.c, and a smattering of changes in
g_rtext.c.  Most heavily hit was rtext_senditup(), which now has to keep
track of both byte offsets (for C) and logical character offsets (for
Tk).  I renamed a few local variables there to help keep track of what
was being counted where.   I haven't added any fields to any built-in
structs though, nor changed the interpretation of any existing fields,
parameters, etc., so everything else ought to be pretty much compatible.

> If you have the diff, maybe post it here and let's start there.

here it is...

marmosets,
	Bryan

-- 
Bryan Jurish                           "There is *always* one more bug."
jurish at ling.uni-potsdam.de      -Lubarsky's Law of Cybernetic Entomology
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pd-devel-0.41.4-src.utf8-moo-2009-03-12.diff
URL: <http://lists.puredata.info/pipermail/pd-dev/attachments/20090312/08bdc985/attachment-0001.txt>


More information about the Pd-dev mailing list