[PD] locales for Pd WAS: japanese encoded chars in PD
Hans-Christoph Steiner
hans at eds.org
Fri Feb 13 22:59:41 CET 2009
On Feb 13, 2009, at 4:38 AM, Bryan Jurish wrote:
> moin all,
>
> On 2009-02-13 03:14:20, Hans-Christoph Steiner <hans at eds.org>
> appears to
> have written:
>> On Thu, 12 Feb 2009, Bryan Jurish wrote:
>>> Are we certain that Tk is actually translating at all, and not just
>>> using some 8-bit default like latin-1 when it finds non-UTF-8
>>> input? I
>>> ask because that's what Perl does by default, a behavior which
>>> continues
>>> to give me headaches. In Perl, each string has its own internal
>>> "utf8"
>>> flag which tells you whether Perl is currently thinking of that
>>> string
>>> as a raw byte-string in some unknown encoding or as a
>>> "native" (utf8)
>>> character string... I assume Tcl/Tk does something similar, but
>>> don't
>>> know how to test for this property there.
>>
>> Here's the doc that I read on this topic, but it probably doesn't
>> have
>> the lvel of detail that you require:
>>
>> http://tcl.tk/man/tcl8.5/TclCmd/fconfigure.htm#M8
>
> Had a look at that last night, but the 'fconfigure' command only
> applies
> to Tcl streams (analagous to the PerlIO layer, which I abhore and
> try my
> best to avoid, as it doesn't provide a sufficient level of control for
> most of my purposes... fconfigure be ok for Pd-devel if we say we're
> dealing exclusively with utf-8... but then again, I don't know if Tcl
> streams ("channels") are used at all by the GUI... maybe on the socket
> to the backend, but that's probably it; IMHO it's safer to explicitly
> generate byte strings in a known encoding and just pass those around).
>
> Also useful is the 'encoding' command family ('encoding convertfrom',
> 'encoding convertto', 'encoding names', 'encoding system'). Tried
> this
> with some expicit escapes as well as a tester widget from
> http://en.wikibooks.org/wiki/Tcl_Programming/Internationalization,
> and I
> get decent display (Japanese still doesn't display with any Tk fonts I
> tried, but I think that's just a font problem). Also tested the bind
> substitutions with a dummy "puts" script, and managed to get real
> utf-8
> sent out over the stdout channel for keyboard input. Still not 100%
> sure how well it's working, since my keyboard only produces latin-1
> symbols (maybe I'll hack my xmodmap for some real testing ;-)
>
> Unfortunately, I still haven't found a way to get Tcl to tell me what
> encoding (if any) it thinks a given string is using, analagous to the
> Perl predicate "utf8::is_utf8($string)". Maybe Tcl doesn't track this
> information on a per-string level at all, but assumes [encoding
> system]
> for all strings? That seems pretty inflexible to me, but after
> another
> look at http://www.tcl.tk/man/tcl8.5/TclCmd/encoding.htm , it does
> indeed seem to be the case. So I guess the only safe way to handle
> things is (as you suggest) to select an internal encoding (e.g. UTF-8)
> and enforce its use with {encoding system "utf-8"}, and possibly
> {fconfigure $ch -encoding "utf-8"} for whatever channels we want. The
> fconfigure manpage says the default channel encoding is [encoding
> system]; but I suspect that perhaps it's really the value of [encoding
> system] at the time of the channel's opening which has an effect, so
> we
> either have to make some accommodations for the standard channels
> (stdin,stdout,stderr), or just leave that up to Tcl (which probably
> defaults to the current locale's LC_CTYPE, but I haven't tested that
> yet)...
>
>> As for Tk hacking for Pd, a big part of the pd-devel effort is to
>> make
>> the Tk GUI code readable, and even extendable! Feel free to hit me
>> with
>> questions, either here, or I am in #dataflow quite a bit these days.
>
> Groovy. I don't think I'll make the devel meeting today, but it's
> beginning to look as if I've got a bit of a bug in my bonnet about
> this ;-)
Hey,
Its good to see someone iwlling to dive in deep. It'll be great to
have full UTF-8 support. Patko and I were looking into how to do it
on the C side, I think what you mentioned, using locale.h and
setlocale() should be enough. Maybe patko will chime in with some
details.
.hc
>
>
> marmosets,
> Bryan
>
> --
> Bryan Jurish "There is *always* one more
> bug."
> jurish at ling.uni-potsdam.de -Lubarsky's Law of Cybernetic
> Entomology
----------------------------------------------------------------------------
Programs should be written for people to read, and only incidentally
for machines to execute.
- from Structure and Interpretation of Computer Programs
More information about the Pd-list
mailing list