[PD] locales for Pd WAS: japanese encoded chars in PD

Bryan Jurish moocow at ling.uni-potsdam.de
Fri Feb 13 10:38:56 CET 2009


moin all,

On 2009-02-13 03:14:20, Hans-Christoph Steiner <hans at eds.org> appears to
have written:
> On Thu, 12 Feb 2009, Bryan Jurish wrote:
>> Are we certain that Tk is actually translating at all, and not just
>> using some 8-bit default like latin-1 when it finds non-UTF-8 input?  I
>> ask because that's what Perl does by default, a behavior which continues
>> to give me headaches.  In Perl, each string has its own internal "utf8"
>> flag which tells you whether Perl is currently thinking of that string
>> as a raw byte-string in some unknown encoding or as a "native" (utf8)
>> character string... I assume Tcl/Tk does something similar, but don't
>> know how to test for this property there.
> 
> Here's the doc that I read on this topic, but it probably doesn't have
> the lvel of detail that you require:
> 
> http://tcl.tk/man/tcl8.5/TclCmd/fconfigure.htm#M8

Had a look at that last night, but the 'fconfigure' command only applies
to Tcl streams (analagous to the PerlIO layer, which I abhore and try my
best to avoid, as it doesn't provide a sufficient level of control for
most of my purposes... fconfigure be ok for Pd-devel if we say we're
dealing exclusively with utf-8... but then again, I don't know if Tcl
streams ("channels") are used at all by the GUI... maybe on the socket
to the backend, but that's probably it; IMHO it's safer to explicitly
generate byte strings in a known encoding and just pass those around).

Also useful is the 'encoding' command family ('encoding convertfrom',
'encoding convertto', 'encoding names', 'encoding system').  Tried this
with some expicit escapes as well as a tester widget from
http://en.wikibooks.org/wiki/Tcl_Programming/Internationalization, and I
get decent display (Japanese still doesn't display with any Tk fonts I
tried, but I think that's just a font problem).  Also tested the bind
substitutions with a dummy "puts" script, and managed to get real utf-8
sent out over the stdout channel for keyboard input.  Still not 100%
sure how well it's working, since my keyboard only produces latin-1
symbols (maybe I'll hack my xmodmap for some real testing ;-)

Unfortunately, I still haven't found a way to get Tcl to tell me what
encoding (if any) it thinks a given string is using, analagous to the
Perl predicate "utf8::is_utf8($string)".  Maybe Tcl doesn't track this
information on a per-string level at all, but assumes [encoding system]
for all strings?  That seems pretty inflexible to me, but after another
look at http://www.tcl.tk/man/tcl8.5/TclCmd/encoding.htm , it does
indeed seem to be the case.  So I guess the only safe way to handle
things is (as you suggest) to select an internal encoding (e.g. UTF-8)
and enforce its use with {encoding system "utf-8"}, and possibly
{fconfigure $ch -encoding "utf-8"} for whatever channels we want. The
fconfigure manpage says the default channel encoding is [encoding
system]; but I suspect that perhaps it's really the value of [encoding
system] at the time of the channel's opening which has an effect, so we
either have to make some accommodations for the standard channels
(stdin,stdout,stderr), or just leave that up to Tcl (which probably
defaults to the current locale's LC_CTYPE, but I haven't tested that
yet)...

> As for Tk hacking for Pd, a big part of the pd-devel effort is to make
> the Tk GUI code readable, and even extendable!  Feel free to hit me with
> questions, either here, or I am in #dataflow quite a bit these days.

Groovy.  I don't think I'll make the devel meeting today, but it's
beginning to look as if I've got a bit of a bug in my bonnet about this ;-)

marmosets,
	Bryan

-- 
Bryan Jurish                           "There is *always* one more bug."
jurish at ling.uni-potsdam.de      -Lubarsky's Law of Cybernetic Entomology





More information about the Pd-list mailing list